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Abstract 

View-oriented group communication services are widely used for fault-tolerant distributed com- 
puting. For applications involving coherent data, it is important to know when a process has a 
primary view of the current group membership, usually defined as a view containing a majority 
out of a static universe of processes. For high availability in a system where processes can join and 
leave routinely, some researchers have suggested defining primary views dynamically, depending 
on having enough members in common with recent views. 

We present a new formal automaton specification, DVS, for the safety guarantees made by a 
practical group communication service providing a dynamic notion of primary view. The specifi- 
cation is a simple automaton, with only seven kinds of actions. We demonstrate the value of DVS 
by showing both how it can be implemented and how it can be used in an application. Both pieces 
are shown formally, with assertional proofs. 

First, we present a distributed algorithm based on a group membership algorithm of Lotem, 
Keidar and Dolev; our version integrates communication with the membership service, uses infor- 
mation from the application processes saying when a view has been prepared for computation by 
the application, and uses a static view-oriented service internally. We prove that this algorithm 
implements DVS, in the sense of trace inclusion. 

Second, we present an application algorithm that is a variant of an algorithm of Amir, Dolev, 
Keidar, Melliar-Smith and Moser, modified to use DVS instead of a static service. We prove that 
it implements a (non-group-oriented) totally-ordered-broadcast service. 

1 Introduction 

Applications designed for distributed systems must cope with failures, because in practical settings 
failures are very likely to happen in a distributed system. Coping with failures in a distributed sys- 
tem, however, is not an easy task. A convenient approach is that of using general purpose building 



"This research was supported by the following contracts: ARPA F19628-95-C-0118, AFOSR F49620-97-1-0337, 
NSF 9225124-CCR, and NSF ITR 0121277. 

^Dipartimento di Informatica ed Applicazioni, Universita di Salerno, 84081 Baronissi (SA), Italy. This author 
is also a member of Akamai's Office of Strategy and Technology. 

+ Basser Department of Computer Science, Madsen Building F09, University of Sydney, NSW 2006, Australia. 

^Dept. of Computer Science and Eng., 191 Auditorium Rd., Unit 3155, University of Connecticut, Storrs, CT 
06269, USA and MIT Laboratory for Computer Science, 545 Technology Square, NE43-371, Cambridge, MA 02139, 
USA. The work of this author was in part supported by a NSF Career Award and by the NSF Grant 9988304. 



blocks that provide powerful distributed services and facilitate the construction of applications. 
One such building block is a view-oriented group communication service. 

Such a service enables application processes located at different nodes of a fault-prone dis- 
tributed network to operate collectively as a group, using the service to multicast messages to 
all members of the group. Examples of view-oriented group communication services are found in 
Isis [5], Transis [13], Totem [31], Newtop [16], Relacs [2], and Horus [34]. 

Solutions to practical, real world problems have benefited from group communication services. 
Isis-based software has been used to provide reliable group communication services for the New 
York Stock Exchange, for the Swiss Electronic Bourse and for the French Air Traffic Control 
System [6]. 

The heart of a group communication service is a group membership service, which provides 
each group member with a view of the group; a view includes a list of the processes that are 
members of the group. Views are crucial because they describe which processes participate in 
the computation and the system allow them to cooperate by guaranteeing that messages sent by 
a process in one view are delivered only to processes in the membership of that view, and only 
when they have the same view. Within each view, the service offers guarantees about the order 
and reliability of message delivery. Clearly each particular group communication service has its 
own set of properties that are offered to the user. A good survey of group communication services 
that provides a description of the guarantees made by each service is provided in [37]. 

For maximum usefulness, system building blocks should have simple and precise specifications 
of their guaranteed behavior. Producing good specifications for view-oriented group communi- 
cation services is difficult, because these services can be complicated, and because different such 
services provide different guarantees about safety, performance, and fault-tolerance. Examples 
of specifications for group membership services and view-oriented group communication services 
appear in [3, 4, 7, 9, 14, 17, 18, 19, 20, 30, 32, 35, 36]. 

In [17], we presented a specification, vs, for a view-oriented group communication service. 
This specification consists of a simple state machine expressing safety requirements, plus a timed 
trace property expressing conditional performance and fault-tolerance requirements. We used 
this specification as the basis for proving the correctness of a complex totally-ordered-broadcast 
algorithm based on [22, 1]. In ensuing work, Chockler has used a version of VS to model and verify 
an adaptive totally-ordered-broadcast algorithm [8], Lesley and Fekete [25] have proved that a 
version of an algorithm of Cristian and Schmuck [10] implements VS, and Khazan [23, 24] has 
used VS in the design of a load-balancing database algorithm. 

The VS service produces arbitrary views, with arbitrary membership sets. However, in many 
applications of VS, especially those with strong data coherence requirements, the application 
processes perform significant computations only when they have a special type of view called a 
primary view. For example, a replicated database application might only perform a read or write 
operation within a primary view, in order to ensure that each read receives the result of the last 
preceding write, in some consistent order of the operations. In this setting, a primary view is 
typically defined to be one whose membership comprises a majority of the universe of processes, 
or more generally, a quorum in a pre-defined quorum set in which all pairs of quorums intersect. 
The intersection property permits information flow from any previous primary to a newly formed 
one. 



Pre-defined quorum sets can yield efficient implementations in settings where the system 
configuration is relatively static. However, they work less well in settings where the configuration 
evolves over time, with processes joining and leaving the system. For such a setting, a dynamic 
notion of primary is needed, one that can change to conform with the system configuration. A 
dynamic notion of primary still needs to maintain some kind of intersection property, in order 
to permit enough information flow between successive primary views to achieve coherence. For 
example, each primary view might have to contain at least a majority of the processes in the 
previous primary view. Several dynamic voting schemes have been developed to define primaries 
adaptively [12, 15, 21, 26, 33]. 

In particular, Lotem, Keidar, and Dolev [26] have described an implementation of a group 
membership service that yields only primary views, according to a dynamic notion of primary. 
An interesting feature of their work is that it points out various subtleties of implementing such a 
membership service in a distributed manner - subtleties involving different opinions by different 
processes about what is the previous primary view. These difficulties have led to errors in some of 
the past work on dynamic voting. The algorithm of [26] copes with these subtleties by maintaining 
information about a collection of primary views that "might be" the previous primary view. The 
service deals with group membership only, and not with communication. Lotem et al. prove that 
their protocol satisfies the following condition on system executions: any two (primary) views 
that occur in an execution are linked by a chain of views where for every consecutive pair of views 
in the chain, there is some process that "knows" it belongs to both views. 

In this paper, we present a new formal automaton specification, dvs, for the safety guarantees 
made by a practical dynamic primary view group communication service. This service is inspired 
by the implementation of Lotem et al., but integrates communication with the group membership 
service. An important feature of our specification is our careful handling of the interface between 
the service and the application. When a new view starts, applications generally require some 
initial pre-processing, typically, an exchange of information, to prepare for ordinary computation. 
For example, processes in a coherent database application may need to exchange information 
about previous updates in order to bring everyone in the new view up to date. We expect each 
application process to indicate when it has completed this pre-processing for a new view v by 
"registering" the view. The DVS service uses registration information when it creates a new view 
v, in order to determine which previously-created views must satisfy the intersection property with 
respect to v. When all members have registered v, the application has gathered all information it 
needs from previous views, and the service no longer needs to ensure intersection in membership 
between views before v and any subsequent ones that are formed. 

Another feature of our specification, compared to that in [26], is that our specification is 
given as an automaton, which maintains state information about the views and the messages sent 
in each view. This global state can be used in invariants and abstraction functions, leading to 
assertional proofs of the correctness of implementations of dvs, and also of applications built over 
dvs. In contrast, Lotem et al. use a specification given in terms of the whole sequence of events 
in an execution, and therefore must use operational reasoning about complex sequences of events. 
Extensive experience with proofs of distributed algorithms suggests that assertional techniques 
are less error-prone; also they are more amenable to automated checking. 



We demonstrate the value of our dvs specification by showing both how it can be implemented 
and how it can be used in an application. Both pieces are shown formally, with assertional proofs. 

First, we consider an implementation that is a variant of the group membership algorithm of 
Lotem et al.; our variant integrates communication with the membership service, uses registration 
information from the application processes saying when a view has been prepared for computation 
by the application, and uses a static view-oriented service (a version of vs) internally. We prove 
that this algorithm implements dvs, in the sense of trace inclusion. The proof uses a (single- 
valued) simulation relation and invariant assertions. The key to the proof is an invariant expressing 
a strong condition about nonempty intersections of views; the proof of this depends on relating a 
local check of majority intersection with known views to a global check of nonempty intersection 
with existing views. 

Second, we consider an application algorithm that is a variant of an algorithm in [22, 1, 17], 
modified to use dvs instead of a static view-oriented service. The modified algorithm uses the 
registration capability to tell the dvs service that information has been successfully exchanged at 
the beginning of a new view. We show that it implements a (non-group-oriented) totally-ordered- 
broadcast service. This proof also uses a simulation relation and invariant assertions. 

We have designed our dvs specification to express the guarantees that we think are useful in 
verifying correctness of applications that use the service. 

Among previous work, two different sorts of specifications for a primary group service are 
notable. Work by Ricciardi and others [36] is expressed in terms of temporal logic on consistent 
cuts; the idea of their specification is that on any cut, there are no disjoint sets of processes 
such that each set is collectively aware of no members outside that set. Lotem et al. [26] use a 
property of an execution, which was previously defined by Cristian [9] for majority groups: any 
two (primary) views are linked by a chain of views where every consecutive pair of views includes 
a process that "knows" it belongs to both views. As far as we know, these previous specifications 
have not been used to verify properties of applications running above them. 

Our specification omits some properties of existing dynamic primary view management algo- 
rithms. For example, Isis [5] guarantees that processes that move together from one view to the 
next receive exactly the same messages in the first view. Guaranteeing this property requires state 
exchange within the view management service. This property is not needed to verify properties 
of applications such as the one giving a totally-ordered broadcast. Also, our service provides no 
explicit support for application- level state exchange. Systems like Isis do provide such support, 
by allowing application-level state exchange messages to be piggybacked on the lower-level state 
exchange messages. 

In Section 2 we present our mathematical notation. The dvs service is presented in Section 3, 
and its implementation in Section 4. In Section 5 we use dvs to implement a totally ordered 
broadcast service. Section 6 contains some conclusions. 



2 Mathematical foundations 

2.1 Sets, functions, sequences 

We write A for the empty sequence. If a is a sequence then \a\ denotes the length of a. If a is 
a sequence and 1 < i < j < \a\ then a(i) denotes the «th element of a and a(i..j) denotes the 
subsequence a(i),a(i + 1), ...,a(j) of a. The head of a nonempty sequence a is a(l). A sequence 
can be used as a queue: the append operation modifies the sequence by concatenating it with a 
new element and the remove operation modifies the sequence by deleting its head. 

If a and b are sequences, a finite, then a+b denotes the concatenation of a and b. We sometimes 
abuse this notation by letting a or b be a single element. We say that sequence a is a prefix of 
sequence 6, written a < 6, provided that there exists c such that a+c = b. A collection A of 
sequences is consistent provided that a < b or b < a for all a, b £ A. If A is a consistent collection 
of sequences, we define lub(A) to be the minimum sequence b such that a < b for all a E A. 

If S is a set, then seqof (S) denotes the set of all finite sequences of elements of S. If a G seqof(S) 
and / is a partial function from S to T whose domain includes the set of all elements of S 
appearing in a, then applytoall (/ ', a) denotes the sequence b such that length(b) = length(a) and, 
for i < length(b), b(i) = f(a(i)). 

If S is a set, the notation S± refers to the set S U {-L}. Whenever S is ordered, we order S± 
by extending the order on S, and making _L less than all elements of S. If R is a binary relation, 
then we define dom(R), the domain of R, to be the set (without repetitions), of first elements of 
the ordered pairs comprising relation R. If / is a partial function from S to T, and (s, t) G S x T, 
then / © (s,i) is defined to be the partial function that is identical to / except that f(s) = t. 

V denotes the universe of all processors, 1 and M. the universe of all possible messages. Q is 
a totally ordered set of identifiers used to distinguish views, with a distinguished least element 
go. A view v = (g,P) consists of a view identifier g E Q and a nonempty membership set 
P C V; we write v. id and v. set to denote the view identifier and membership set components of 
v, respectively. V denotes the set of all views, and vq = (go, Po) is a distinguished initial view. 

2.2 I/O automata 

We describe our services and algorithms using the I/O automaton model of Lynch and Tuttle [28] 
(without fairness). The model and its proof methods are described in Chapter 8 of [27]. 

An execution fragment of an I/O automaton is an alternating sequence of states and actions 
consistent with the transition relation. An execution is an execution fragment that begins with 
a start state. The trace of an execution fragment a is the subsequence of a consisting of all the 
external actions. The external behavior of an I/O automaton is captured by the set of traces 
generated by its executions. 

Execution fragments can be concatenated. Definitions of composition for I/O automata appear 
in Chapter 8 of [27], along with theorems showing that composition respects the external behavior. 
Invariant and simulation methods for these models are also presented in that chapter. 

1 We use "processor" and "process" interchangeably, since the difference is immaterial in the context of this 
paper. 



3 The dvs specification 

We now present DVS, our specification for a dynamic primary view group communication service. 

The dvs service works as follows. Each client of the service has a "current" view of the group 
of processes. A process can send a message to all other members of its current view and the 
service guarantees that messages sent within a view are delivered only within that view and each 
member of the view receives messages in the same order as other members. However, not all 
messages need to be delivered to all members. The service also provides a "safe" notification 
for a particular message m that tells the recipient that message m has been received by all the 
members of the current view. New views are announced to all members of the new view and 
they are guaranteed to be "primary" views. Primary views are defined according to a dynamic 
notion [21]: a new primary needs to contain a majority of the members of the previous primary. 
The dvs service allows the clients to "register" a new view after completing the pre-processing 
for that view. 

The specification is given in Figure 1. In this specification, M. c C M. denotes the set of mes- 
sages that clients may use for communication. The most interesting part of the DVS specification 
is the transition definition for dvs-createview(v). The precondition specifies the properties that a 
view must satisfy in order to be considered primary. For example, the precondition says that v. set 
must intersect the membership set of all previously-created smaller-id views w for which there is 
no intervening totally registered view - that is, the set of all "possible previous primary views". 
Since (for convenience) we allow out-of-order view creation in dvs, we also include a symmetric 
condition for previously-created larger-id views. All created views are recorded in created. 

DVS informs its clients of view changes using dvs-newview((p,P)) p actions; such an action in- 
forms processor p that the view identifier g is associated with membership set P and that the 
current group of processors connected to p is P. After any finite execution, we define the current 
view at p to be the argument v in the last dvs-newview(v) p event, if any, otherwise it is the initial 
view vo for processors in Pq and is undefined for other processors. Even though views can be 
created out of view id order, the notification to each client is consistent with that order. Not 
every client needs to see every view. The variable attempted records, for each view, which process 
have been notified of that view. Variable attempted is only used in the proof. 

With the dvs-registerp action, the client at p informs the service that it has obtained whatever 
information the application needs to begin operating in the new view v. For many applications, 
this will mean that p has received messages from every other member of view v, reporting its state 
at the start of v. The variable registered records, for each view, which process have registered 
that view. Variable registered is only used in the proof. 

dvs allows a processor p to broadcast a message m using a Dvs-GPSND(m) p action, and delivers 
the message to a processor q using a Dvs-GPRcv(m) p ,q action, dvs also uses a Dvs-sAFE(m) p ,q action 
to report to processor q that the earlier message m from p has been delivered to all members of 
the current view of q. dvs guarantees that messages sent by a processor p when the current view 
of p is v are delivered only within view v (i.e., only to processors in v. set whose current view is 
v). Moreover, each processor receives messages in the same order as other processor and without 
gaps in the sequence of received messages; however, a processor may receive only a prefix of the 
sequence of messages received by another processor. Variables queue, pending, next and next-safe 
are used for handling the messages. Their use should be clear from the code. 



Signature: 

Input: DVS-GPSND (m) p , m 6 M c , p e V 

DVS-REGISTERp, p 6 V 

Internal: dvs-CREATEVIEw(v), v 6 V 

DVS-ORDER(m,p,p), m 6 M c , p €V, g 6 Q 

State: 

created € 2 , init {uo} 

for each p € V: 

current-viewid\p] € £7x, init go if p € Po, -L else 

for each g €: Q: 

queue[g] € seqof(Ai c x P), init A 
a££emp£ed[g] € 2 , init Po if S = <?o, {} else 
registered^] € 2 V , init Po if p = go, {} else 



Transitions: 



internal DVS-CREATEVIEw(v) 
Pre: Vw € created : v.id ^ w.id 
Vw € created : 
3a; G Tbtlieg : w.id < x.id < v.id 
or 3x € Tbtlieg : v.id < x.id < w.id 
or v.set PI w.set ^ {} 
Eff: created := created U {«} 

output DVS-NEWVIEW(w) p 

Pre: u € created 

v.id > current-viewid\p] 
Eff: current-viewid\p] := v.id 

attempted[v.id] := attempted[v.id] U {p} 

input DVS-REGISTERp 

Eff: if current-viewid\p] ^ _L then 

regis£ered[e?irren£-'UJemd[p]] := 
regis£ered[e?irren£-'UJemd[p]] U {p} 



Output: DVS-GPRCV(m)p j9 , m £ VW C , p,q€V 
DVS-SAFE(m)p,q, m 6 jM c , p,q €V 
DVS-NEWVIEW(«)p, v 6 V, p 6 v.set 



for each p € V, g € G'- 

pending\p, g] € seqof(Ai c ), init A 
nearf[p,g] 6 N >0 , init 1 
next-safe\p, g] € N >0 , init 1 



internal DVS-ORDER(m,p, g) 
Pre: m is head of pending\p, g] 
Eff: remove head of pending\p, g] 
append (m,p) to queue[g] 

output DVS-GPRCv(m)p,q, choose g 
Pre: g = current- viewid[q] 

queue[g](next[q, g]) = (m,p) 
Eff: next[q, g] := next[q, g] +1 

output DVS-SAFE(m)p,q, choose g,P 



Pre: 



Eff: 



g = current- viewid[q] 
(g,P) € created 

queue[g](next-safe[q, g]) = (m,p) 
for all r € P: 

next[r, g] > next-safe[q, g] 
next-safe[q, g] := next-safe[q, g] +1 



input DVS-GPSND (m)p 
Eff: if current-viewid\p] ^ _L then 

append m to pending\p, current-viewid\p]] 



Figure 1: The DVS service 



We define the following derived variables: 

Att G 2 V , denned as {v G created \ attempted[v .id] ^ {}} 
TotAtt G 2 V , denned as {v G created \ v. set C attempted[v Ad]} 
TZeg G 2 V , defined as {u € created \ registered^ .id] / {}} 
TbtTZeg G 2 V , denned as {t> G created \ v. set C registered^ .id]} 

Informally, a view belongs to the set .4i£ if it has been reported to at least one member of the 
view (we say that it is attempted). A view belongs to the set TotAtt if it has been reported to all 
members of the view (we say that the view is totally attempted). Similarly, a view belongs to the 
set TZeg if at least one member of the view has registered the view (we say that it is registered) 
and belongs to the set TbtTZeg, if all members of the view have registered the view (we say that 
the view is totally registered). 

We close this section with some invariants giving properties of DVS. 

The first one is a trivial invariant which follows directly from the definition of the sets 
Att, lot Att, TZeg and TbtTZeg. 
Invariant 3.1 (dvs) 
In any reachable state, TotAtt C Att, TbtTZeg C TZeg, TZeg C Att, and TbtTZeg C TotAtt. 

The next invariant is a basic invariant saying that if a process p has attempted a view v whose 
identifier is g then the current view of p is either v itself or a view with an identifier greater than 

9- 

Invariant 3.2 (dvs) 

In any reachable state if p G attempted[g] then current-viewid[p] > g. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state p G attempted[g] implies that p G Po and 
g = go. For p G Po we have that current-viewid[p] = go and hence the invariant is true. 

For the inductive step assume the invariant is true in s. We need to prove that it is true 
in s' for any possible step (s,n,s'). The only step that changes attempted and current- viewid is 
n =dvs-createview(w) p . By the precondition of n we have that for any g for which p G attempted[g] 
it holds v.id > current- viewid[p] and by the code we have that the new value of current- viewid[p] 
is v.id. Hence the invariant is still true. Q 

Invariant 3.3 expresses the key intersection property guaranteed by DVS; this is weaker than 
the intersection property required by static definitions of primary views, which says that all 
primary components must intersect. This invariant is our version of the correctness requirement 
for dynamic view services that two consecutive primary views intersect. 

Invariant 3.3 (dvs) 

In any reachable state, if v,w G created, v.id < w.id, and there is no x G IbtTZeg such that 
v.id < x.id < w.id, then v. set n w.set ^ {}. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state created = {vq} and thus the invariant is 
vacuously true. 



For the inductive step assume the invariant is true in s. We need to prove that it is true in s' 
for any possible step (s,7r,s'). The only steps that can change the hypothesis from false to true 
are dvs-createview(v) and dvs-createview(«;). The preconditions of these actions show that the 
needed conclusion holds. No step changes the conclusion from true to false. □ 

Invariant 3.4 says that if a view w is totally attempted, then any earlier view v has a member 
whose current view is later than v. 

Invariant 3.4 (dvs) 

In any reachable state, if v G created, w G TbtAtt, and v. id < w.id, then there exists p G v. set 

with current-viewid[p] > v. id. 

Proof: Consider any particular reachable state. Assume that v G created, w G TbtAtt, and 
v.id < w.id. Then let y be the view in TbtAtt having the smallest viewid strictly greater than 
v.id. Then there is no x G TbtAtt with v.id < x.id < y.id. Then Invariant 3.1 implies that there 
is no x G TofReg with v.id < x.id < y.id. Then Invariant 3.3 implies that v.set n y.set ^ {}. Let 
p G v.setdy.set; thenp G attempted[y.id]. Then Invariant 3.2 implies that current-viewid[p] > y.id. 
This implies current-viewid[p] > v.id. Q 



4 An implementation of DVS 

We now present an algorithm that implements the dvs service specification and reason about 
its correctness. Our implementation uses as a building block the group communication service 
VS [17], and it uses the ideas from [26]. The overall system is comprised of the automata VS- 
TO-DVSp, for each p G V , and the VS service. We call this system dvs-impl and we illustrate 
it in Figure 2. Formally, the DVS-IMPL system is the composition of all VS-TO-DVS p automata 
(presented in Section 4.2) and the VS automaton (given in Section 4.1). We show that dvs-impl 
is a formal implementation of the dvs service in the sense of the trace inclusion, that is we prove 
that any trace of the dvs implementation is a trace of the dvs specification (Section 4.3). 




Figure 2: The DVS-IMPL system. 



4.1 The VS specification 

The VS service [17] is a group communication service that is similar to dvs except that VS does 
not provide support for primary views. The DVS service thus can be seen as an augmented version 
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of the VS service designed to provide support for primary views. Due to the similarity of the two 
services, VS is a convenient building block for DVS. The specification for the VS service is given in 
Figure 3. To avoid a complete restatement we refer the reader to [17] for an informal description 
of the service. 



Signature: 

Input: VS-GPSND(m) p , m E M, p E V 

Internal: VS-CREATEVlEw(v), v 6 V 

VS-ORDER(m,p, g), m E M, p EV, g E G 
State: 

created E 2 V , init {vo} 
for each p EV: 

current-viewid\p] E G±, init go if p E Po, -L else 
for each g E Q\ 

queue[g] E seqof(Ai x V), init A 

Transitions: 

internal VS-CREATEVIEw(v) 
Pre: Vw € created : v.id > w.id 
Eff: created := created U{w} 

output VS-NEWVIEW(«) P 

Pre: v E created 

v.id > current-viewid\p] 
Eff: current-viewid\p] := v.id 

input vs-GPSND(m) p 
Eff: if current-viewid\p] ^ _L then 

append m to pending\p, current-viewid\p]] 

internal VS-ORDER(m,p, g) 
Pre: m is head of pending\p, g] 
Eff: remove head of pending\p, g] 
append (m,p) to queue[g] 



Output: VS-GPRCV(m) p ,q, m 6 M, p,q E V 
VS-SAFE(m)p,q, m E M, p, q E V, 
VS-NEWVIEW(«)p, v E V, p E v.set 

for each p E V, g E Q: 

pending\p, g] E seqof(Ai), init A 
next\p,g] E N >0 , init 1 
next-safe\p, g] E N >0 , init 1 



output vs-GPRCv(m)p,q, choose g 

Pre: g ± _L 

g = current- viewid[q] 
queue[g](next[q, g]) = (m,p) 

Eff: next[q, g] := next[q, g] +1 

output vs-SAFE(m)p,q, choose g, P 
Pre: g ± _L 

g = current- viewid[q] 
(g,P) E created 

queue[g](next-safe[q, g]) = {m,p) 
for all r E P: 
next[r, g] > next-safe[q, g] 
Eff: next-safe[q, g] := next-safe[q, g] +1 



Figure 3: The VS service 

As also reasoned in [17], the fact that VS allows views to be created only in the order of view 
identifier is not significant: weakening this requirement to allow out-of-order view creation does 
not change the external behavior, because vs-newview actions are constrained to occur in such a 
way that views are always delivered in the order of view identifiers. 

We rely on the following safety properties of the VS service [17]: 

• New views are reported in increasing order of view identifier (monotone views property); 

• Messages sent in a view are delivered only within that view (view synchrony property); 

• The sequences of messages delivered in a view at any two processors are such that one sequence 
is a prefix of the other (prefix order property). 
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The following invariant holds. 

Invariant 4.1 (vs) 

In any reachable state, if v,v' £ created and v. id = v' '.id, then v = v' . 

4.2 The dvs implementation algorithm and the dvs-impl system 

The DVS implementation algorithm is given in terms of the automaton VS-TO-DVS p , where p € V, 
in Figure 4. VS-TO-DVS p acts as a "filter", receiving vs-newview inputs from the underlying 
VS service and deciding whether to accept the proposed views as primary views. If VS-to-dvs p 
decides to accept some such view v, it "attempts" the view by performing a dvs-newview(w) output. 
For each v, we think of the DVS internal dvs-createview(w) action as occurring at the time of the 
first dvs-newview(w) event. 

VS-TO-DVSp uses special messages, tagged either with "info" or "registered". Thus, we use 
M. = M c U ({ "info"} x V x 2 V ) U { "registered"}, where M c is the set of all client messages and 
M. is the universe of all messages. The state variables attempted, reg, and info-sent are auxiliary 
- they are not needed for the algorithm, and are only used in the proofs. 

According to the DVS specification, the algorithm is supposed to guarantee nonempty inter- 
section of each newly created primary view v with any previously created view w having no 
intervening totally registered view - this is a global condition involving nonempty intersection 
of view sets. The VS-to-dvs p processors, however, do not have accurate knowledge of which 
primary views have been created by other processors, nor of which views are totally registered. 
Therefore, the processors employ a local check of majority intersection with known views, rather 
than a global check of nonempty intersection with existing views. Specifically, each vs-to-dvs p 
keeps track of an "active" view act, which is the latest view that it knows to be totally registered, 
plus a set of "ambiguous" views amb, which are all the views that it knows have been attempted 
(i.e., have had a dvs-newview action performed someplace), and whose identifiers are greater than 
act. id. We define use = {act} U amb. When vs-to-dvs p receives a vs-newview(w) input, it sends 
out "info" messages containing its current act and amb values to all the other processors in the 
new view, using the VS service, and then waits to receive corresponding "info" messages for view 
v from all the other processors in the view. After receiving this information (and updating its 
own act and amb accordingly), VS-TO-DVS p checks that v has a majority intersection with each 
view in use. If so, VS-TO-DVSp performs a dvs-newview p output. 

Following the dvs-newview p even, the clients of the communication system can exchange state 
information as needed for processing in view v. When the client at p has obtained enough 
information, it "registers" the view by means of action dvs-register p , which causes processor p to 
send "registered" messages to the other members. When a processor receives "registered" messages 
for a view v from all members, it may perform garbage collection by discarding information about 
views with identifiers smaller than that of v. VS-to-dvs uses VS to send and receive messages. 

The system dvs-impl is defined as the composition of all the VS-to-dvs p automata and the 
VS service, with all the external actions of VS hidden. 

We define four derived variables for dvs-impl analogous to those of DVS, indicating the 
attempted, totally attempted, registered, and totally registered views, respectively. They are: 

• Att = {v E created | (3p € v.set)v € attempted^; 
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Signature: 

Input: DVS-GPSND (m) p , m 6 M c Internal: 

DVS-REGISTERp Output: 

VS-NEWVIEW(w) p , v 6 V, p 6 v.set 
VS-GPRCv(m)q, p , m 6 X, q €V 
VS-SAFE(m)q,p, m 6 X, q eV 
State: 

car G Vx , init vo if p € Po , -L else 
client-cur € Vx, init «o if p € Po, -L else 
act € V, init vo 
amb € 2 , init {} 

attempted 6 2 V , init {«o} if p € Po, {} else 
for each g€ 5 

msgs-to-'ys[p] € seqof(Ai), init A 

msgs-/rom-'ys[p] € seqof(Ai c x P), init A 

safe-from-vs[g] € seqof(Ai c x P), init A 

reg[g] a bool, init true if p € Po and p = go, false else 

m/o-*ent[p] 6 (V x 2 v )x, init _L 

Transitions: 

input VS-NEWVIEw(w)p 
Eff: cur := w 

append ( "info", act, amb) to 

msgs-to-vs[cur.id\ 
info-sent[cur.id] := (act, ami) 

input VS-GPRCV(( "info", v, V)) g , p 
Eff: info-rcvd[q, cur. id] := (u, V) 
if u.id > act. id then act := w 
am& := {w € am& U V \ w.id > act. id} 



input vs-SAFE(("info",v,V)) q ,p 
Eff: none 

output DVS-NEWVIEW(«)p 

Pre: v = cur 

v.id > client-cur. id 

Vg € v.set, q ^ p : info-rcvd[q, v.id] ^ _L 

Vw € use : |v.set fl w.set\ > \w.set\/2 
Eff: amb:=ambU {v} 

attempted := attempted U {«} 

client- cur := w 

input DVS-REGISTERp 

Eff: if client- cur ^ _L then 

reg[client-cur] := true 
append ( "registered") to 
ms(/,s-to-'u,s[e/ient-e?4r.ia'] 

input VS-GPRCV(( "registered")) ,, p 
Eff: rcvd-rgst[cur.id, q] := true 



DVS-GARBAGE-COLLECT(w)p, V 6 V 

VS-GPSND(m)p, m 6 7W 
DVS-NEWVIEW(«)p, w 6 V, p 6 v.set 
DVS-GPRCv(m)q,p, m 6 7W C , g 6 P 
DVS-SAFE(m)q,p, m 6 jM c , g 6 P 



for each g € G, q € V 

info-rcvd[q,g] 6 (V x 2 v )x, init _L 
rcvd-rgst[q, g] a bool, init false 



Derived variables 

use € 2 , defined as use ■ 



{act} U amb 



input vs-SAFE(( "registered")) ,, p 
Eff: none 

internal DVS-GARBAGE-COLLECT(w) p 
Pre: Vg € v.set : rcvd-rgst[q, v.id] = true 

w.id > act. id 
Eff: act:= v 

amb := {w € am& | w.id > act. id} 

input DVS-GPSND (m)p 
Eff: if client- cur. id p ^ _L then 

append m to msgs-to-vs[client-cur.id] 

output VS-GPSND(m)p 
Pre: m is head of msgs-to-vs[cur.id] 
Eff: remove head of msgs-to-vs[cur.id] 

input vs-GPRCv(m)q,p, where m € M c 
Eff: append (m,q) to msgs-from-vs[cur.id] 

output DVS-GPRCv(m) 9 ,p 
Pre: (m,q) is head of msgs-from-vs[client-cur.: 
Eff: remove head of msgs-from-vs[client-cur.ic 

input vs-SAFE(m)q,p, where in 6 M c 
Eff: append (m,q) to safe-from-vs[cur.id] 

output DVS-SAFE(m)p 

Pre: (m,q) is head of safe-from-vs[client-cur.i( 
Eff: remove head of safe-from-vs[client-cur.id] 



Figure 4: VS-TO-DVS p 
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• TotAtt = {v G created | (Vp € v.set)v G attempted p }; 

• 7£eg = {v G created \ (3p G t> .sei)re^[u.«d] p = true}; and 

• TofReg = {v G created | (Vp G t> .sei)re^[u.«d] p = true}. 

Another derived variable, wse p is denned in the code of VS-to-dvs p . 

4.3 Correctness of the dvs-impl system 

We prove that dvs-impl implements dvs using a forward simulation argument [29] by providing 
an abstraction function that maps states of dvs-impl to states of dvs and that leads to the main 
observation that each trace of dvs-impl is a trace of dvs. We present such an abstraction function 
in Section 4.3.2. 

Section 4.3.1 unveils a series of invariants of dvs-impl culminating in Invariant 4.17 and 
Invariant 4.18. The local condition requiring a majority intersection is captured by Invariant 4.17. 
Invariant 4.18 states that any two attempted views that have no intervening totally registered 
view have at least one member in common. This is the global condition on nonempty intersection 
that we have discussed in the previous section. These invariants are then used in the proof that 
dvs-impl implements dvs in Section 4.3.2. 

4.3.1 Invariants 

We begin with invariants that state simple facts about dvs and then proceed to more complex 
ones ending with the key invariant about the global condition on nonempty intersections. 

Invariant 4.2 (dvs-impl) 

In any reachable state, if cur p ^ _L then current- viewid[p] = cur.id p . 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix p. In the initial state we have that cur p = _L. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in s' 
for any possible step (s, n, s'). Fix p. We prove the invariant considering each possible action n. 

1. 7T = VS-NEWVIEW(«) P . 

By the code of n in vs, we have that current-viewid[p] = v.id. By the code of n in dvs-impl, 
we have that cur.id p = v.id. 

2. Other actions. 

Variables current- viewid[p] and cur.id p are not modified. Hence the assertion cannot be made 
false. 
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Invariant 4.3 (dvs-impl) 

In any reachable state, if v G attempted p then client- cur. id p > v.id. 
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Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix v,p. In the initial state we have that attempted p = {vo} for 
p G -Po and attempted p = _L for p G' Pq. So assume that v = vq and p6?o- Then client- cur p = vq. 
Hence the invariant is true. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in 
s' for any possible step (s,7r,s ; ). Fix v,p and assume that v G s'.attempted p . We distinguish two 
possible cases. 

1. v € s. attempted p . 

By the inductive hypothesis we have that s. client- cur v > v. id. By the monotonicity of 
client- cur p we have that s' '.client- cur p > s. client- cur p . 

2. v G' s. attempted p . 

Then it must be n =dvs-newview(w) p . The invariant follows from the code which sets client- cur p 
to v. 



Invariant 4.4 (dvs-impl) 

In any reachable state, if v G info-sent[g] p = (x,X) then cur.id p > g. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix v,p. In the initial state we have that info-sent p = _L and 
thus the invariant is vacuously true. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in 
s' for any possible step (s,n,s'). Fix p,g,x,X and assume that s' .info-sent[g] p = (x,X). We 
distinguish two possible cases. 

1. s.info-sent[g] p = (x,X) 

By the inductive hypothesis we have that s.cur p > g. By the monotonicity of cur p we have 
that s'.cur p > s.cur p . Hence the invariant is true. 

2. s.info-sent[g] p ^ (x,X) 

Then it must be n =vs-newview(«) p and g = v.id = s'.act.id p . Action vs-newview(u) p sets s 1 .cur 
to v, so s'. cur. id = g. 

U 

Invariant 4.5 (dvs-impl) 
In any reachable state: 

1. vo € TbtReg. 

2- 9o < v.id for all v G created. 
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Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Part 1 is true because in then initial state every processor 
p € -Po has reg[go] = true. Part 2 is true because the only view in created is vo- 

For the inductive step assume the invariant is true in s. We need to prove that it is true in s' 
for any possible step (s, n, s'). 

Consider Part 1 first. No view is ever removed from TbtReg. Hence no step can make the 
assertion false. Consider Part 2 now. Fix v and assume that v € s'. created. We distinguish two 
cases. 

1. v € s. created. 

Then the assertion follows from the inductive hypothesis. 

2. v $ s. created. 

It must be 7t=vs-createview(v) p . By the precondition of this action we have that v.id > w.id 
for all w € s. created. By the inductive hypothesis go < w.id for all w € s. created. Since 
s'. created = s.createdU {v}, it follows that go < w.id for all w £ s'. created. 

U 

Invariant 4.6 (dvs-impl) 

In any reachable state, if rcvd-rgst[q,v.id] p ^ _L then cur p ^ _L. 

Proof: By induction on the length of the execution. The base case consists of proving that 
the invariant is true in the initial state. Fix p, q and v. In the initial state we have that 
rcvd-rgst[q, v.id] p = _L. Hence the invariant is vacuously true. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in s' 
for any possible step (s, n, s'). Fix p, q, v. We prove the invariant considering each possible action 
n. Assume that s' .rcvd-rgst[q, v.id] p / _L. 

1. 7T = VS-NEWVIEW(t)) p . 

Since s'.cur p = v we have that s'.cur p ^ _L (vs cannot deliver _L, it is not a view). 

2. 7T = VS-GPRCv(("registered")) Pt q. 

By the precondition of n (see vs) we have that s.current-viewid[p] / _L. By Invariant 4.2 we 
have s.cur.id p = s.current-viewid[p] ^ _L. Hence s'.cur.id p = s.cur.id p / _L. 

3. Other actions. 

Variables rcvd-rgst[q,v.id] p and cur p are not modified. Hence the assertion cannot be made 
false. 
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Invariant 4.7 (dvs-impl) 

In any reachable state, if cur.id p = _L then act p = vq and amb p = {}. 
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Proof: By induction on the length of the execution. The base case consists of proving that 
the invariant is true in the initial state. Fix p. In the initial state we have that act p = vq and 
ambp = {}. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in s' 
for any possible step (s, n, s'). Fix p. We prove the invariant considering each possible action n. 
Assume that s'.cur p = _L. Since no actions sets cur p to _L it must be s.cur p = _L. 

1. 7T = VS-GPRCV(("info", V, V)) p , q . 

This cannot happen. Indeed by precondition of ir (see vs) we have that s.current-viewid[p] / 
_L. By Invariant 4.2 we have s.cur.id p = s.vs.current-viewid[p] Hence s'.cur.id p = s.cur.id p / 
_L. But we know that s'. cur. id = _L. 

2. 7T = DVS-NEWVIEW(v)v. 

Cannot happen. Indeed the precondition of n says that v = s.cur p . Since s. cur. id = _L, we 
have v = _L. Thus the precondition v.id > client- cur. idp cannot be satisfied (_L cannot be 
strictly greater than any view identifier). 

3. 7T = dvs-garbage-collect(w). 

Cannot happen. Indeed by Invariant 4.6 we have that s.cur p / _L. But we know that 
s.cur p = _L. 

4. Other actions. 

Variables cur p , actp and amb p are not modified. Hence the assertion cannot be made false. 

D 

The following invariant states that if an "info" message is in transit for view v or has been 
received by some process q in view v then there exists a process p that has sent the "info" in view 
v and such that its current view is either v or a later one. 

Invariant 4.8 (dvs-impl) 

In any reachable state, let C be the following condition: 

("info",x,X) € msgs-to-vs[g] p or ("info",x,X) € pending[p, g] or {{"info",x,X),p) G 
queue[g] or info-rcvd[p,g] q = (x,X). 

If C is true then info-sent[g] p = (x,X) and cur. idp > g. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix p,q,g,x and X. In the initial state msgs-to-vs[g] p = A, 
pending[p, g] = A, queue[g] = A and info-rcvd[p, g] q = _L. Hence, in the initial state, C is false and 
the invariant is vacuously true. 

For the inductive step assume that the invariant is true in a reachable state s. We need to 
prove that it is true in state s' for any possible step (s, n, s') of the execution. Fix p, q, g, x, and 
X and assume that C is true in s'. 
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1. 7T = VS-NEWVIEW(«) P . 

By the code of n, s'.cur p = v. Assume v. id ^ g. Then the code of n shows that none of 
msgs-to-vs[g] p , pending\p,g], queue[g] or info-rcvd[p,g] q is changed during this step. Thus C 
is true also in s. By the inductive hypothesis we have s.info-sent[g] p = (x,X) and cur.id p > g. 
Since we are considering the case v. id ^ g, we have that info-sent[g] p is not changed by n. 
Moreover the precondition of it (see vs) shows that s' .current-viewid[p] > s.current-viewid[p]. 
By Invariant 4.2, cur.id p = current- viewid[p], so s'.cur.id p > s.cur.id p . This completes show- 
ing the conclusion for the situation w.id ^ g. 

Assume now v.id = g. The code shows s'.cur.idp = g as required. It remains to show that 
(x,X) G info-sent[g] p . 

Action 7r does not alter the values of pending\p,g], queue[g] and info-rcvd[p,g] q and ap- 
pends ("info" , s.actp, s.ambp) to msgs-to-vs[g] p . We claim that it must be x = s.act p and 
X = s.ambp. Indeed if it is not so, then condition C is true also in state s (for the given 
p,q,g,x,X) and by the inductive hypothesis we have s.cur.id p > g = w.id. By Invariant 4.2, 
s. current- viewid[p] > w.id. But this contradicts the precondition of n (see vs). 
Thus x = s.act p and X = s.amb p . Then the code of n shows that (x,X) G info-sent[g] p , as 
required. 

2. 7T = VS-GPRCV(("info", V, V)) p , q . 

If g / cur.idg then since C is true in s' it is true also in s (for the given p,q,g,x,X). Thus 
the inductive hypothesis is true. Since the code does not change info-sent[g] p and cur.idp, the 
invariant follows from the inductive hypothesis. 

Hence assume that g = cur.idg. First consider the case x = v and X = V. In this case, by 
the precondition of it (see vs) we have that (( u info'\x,X),p) € queue[g]. Then the invariant 
follows from the inductive hypothesis. 

Consider now the case x / v or X / V. In this case, by the code, we have that s' .info-rcvd[p, g] q / 
(x, X). Since C is true in s', it must be that ( "info", x, X) € msgs-to-vs[g] p or ( "info", x, X) G 
pending[p, g] or (( "info", x, X),p) G queue[g] is true in s' . Variables msgs-to-vs[g] p , pending[p, g] 
and queue[g] are not changed by n. Hence C is true in s. The invariant follows from the in- 
ductive hypothesis. 

3. 7T = VS-GPSND(("m/o", V, V)) p . 

If g / client- cur.idp then since C is true in s' it is true also in s (for the given p,q,g,x,X). 
Thus the inductive hypothesis is true. Since the code does not change info-sent[g] p and cur.idp, 
the invariant follows from the inductive hypothesis. 

Hence assume that g = client- cur.idp. First consider the case x = v and X = V. In this case, 
by the precondition of n (see dvs-impl) we have that (( "info", x, X),p) G msgs-to-vs[g]. Then 
the invariant follows from the inductive hypothesis. 

Consider now the case x / v or X / V. Since C is true in s' we have that C is true in s too. 
Indeed no ( u info",x,X) message is deleted and info-rcvd[p,g] q is not changed. The invariant 
follows from the inductive hypothesis. 
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4. 7T = VS-ORDER({ "info", v, V),p,g). 

First consider the case x = v and X = V . In this case, by the precondition of 7r we have that 
(( "info" ,x,X),p) € pendmpfg]. Then the invariant follows from the inductive hypothesis. 

Consider now the case x / v or X / V. Since C is true in s' we have that C is true in s too. 
Indeed no ("info",x,X) message is deleted and info-rcvd[p,g] q is not changed. The invariant 
follows from the inductive hypothesis. 

5. Other actions. 

Condition C never changes from false to true and variables info-sent[g] p and cur.id p are not 
modified. Hence the assertion cannot be made false. 

D 

The following invariant states that if a "registered" message for view v has been sent by process 
p then variable reg[v.id] p is set to true (that is, the view has been registered by the client at p). 

Invariant 4.9 (dvs-impl) 

In any reachable state, let C be the following condition: 

{"registered") € msgs-to-vs[g] or ("registered") € pending[p,g] or ( "registered", p) € 
queue[g] or rcvd-rgst[p, g] q = true. 

// C is true then reg[g] p = true. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix p, g, q. In the initial state we have that msgs-to-vs[g] = A, 
pending[p,g] = A, queue[g] = A and rcvd-rgst[p, g] q = false. Hence C is false in the initial state 
and the invariant is vacuously true. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in s' 
for any possible step (s, n, s'). Fix p, g, q and assume that C is true in s'. 

1. 7T = DVS-REGISTERp. 

If s. client- cur.id p / g then C is true also in s and the invariant follows from the inductive 
hypothesis. Hence assume s. client- cur.idp = g. By the code of n we have that we have 
reg[g] p = true. 

2. 7T = VS-GPSND(("registered")) p . 

If s.current-viewid[p] / g then C is true also in s and the invariant follows from the inductive 
hypothesis. Hence assume g = s. current- viewid[p]. By Invariant 4.2 we have that s. cur.idp = 
s.current-viewid[p]. Hence s. cur.idp = g. By the precondition of n (see dvs-impl) we have 
that ( "registered") € s.msgs-to-vs[g] p . Hence C is true in s and the invariant follows from the 
inductive hypothesis. 

3. IT =VS-ORDER(( "registered", p',g')). 

If p' ^ p or g' / g then C is true also in s and the invariant follows from the inductive 
hypothesis. Hence assume p' = p and g' = g. By the precondition of n we have that 
( "registered") € s.pending[p, g]. Hence C is true also in s and the invariant follows from the 
inductive hypothesis. 
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4. Other actions. 

Condition C never changes from false to true and variable reg[g] p is not modified. Hence the 
assertion cannot be made false. 

D 

The following invariant states some facts about views in TotReg. 

Invariant 4.10 (dvs-impl) 
In any reachable state: 

1. actp € TotReg. 

2. If info-sent[g] p = (x,X) then x € TotReg. 

3. use p n TotReg ± {}. 

Proof: First notice that Part 3 follows easily from Part 1 and the fact that, by definition, 
actp 6 use p . Hence we only need to prove Parts 1 and 2. 

By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. For Part 1, fix p. In the initial state act p = vq and vq is 
totally registered by definition. For Part 2, fix p, g. In the initial state info-sent[g] p = _L. Hence 
the invariant is vacuously true. 

For the inductive step assume the invariant is true in s. We need to prove that it is true in 
s' for any possible step (s, n, s'). Fix p, g, x and X. We prove the invariant by considering each 
possible action. 

1. 7T = VS-NEWVIEW(w) p . 

Part 1 is still true in s' because act p is not modified (as well as TotReg). 

Consider Part 2 now. Assume that s'.info-sent[g] p = (x,X). liv.id / g then s.info-sent[g] p = 
(x,X) then by the inductive hypothesis we have that x € s. TbtReg. Since no view is ever 
removed from TotReg we have that x € s' .IbtReg, as needed. Hence we can further assume 
that v.id = g. Since s' .info-sent[g] p = (x,X) and action n sets info-sent[g] p = (act p , amb p ) it 
must be that s.act p = x and s.amb p = X. 

By the inductive hypothesis, Part 1, we have that s.act p € s.TotReg. But x = s.act p and no 
view is removed from TotReg. Hence x € s' .TotReg. Thus Part 2 is still true in s'. 

2. 7T = VS-GPRCV(("info", V, V)) p , q . 

Consider Part 1 first. If s' .act p = s.act p then Part 1 follows by the inductive hypothesis. Hence 
assume that s'.act p / s.act p . By the code we have that s'.act p = v. Thus we have to prove that 
v € TotReg. By the precondition of n (in vs) we have (( u info'\v,V),q) € s.queue[cur.id p ]. 
Then Invariant 4.8 implies that s.info-sent[cur.id p ] q = {v,V). By the inductive hypothesis, 
Part 2, we have that v € s.TotReg, as needed. 

Part 2 is preserved because info-sent[g] p is not modified. 
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3. 7T = dvs-garbage-collect(«) p . 

Consider Part 1 first. If s'.act p = s.act p then Part 1 follows by the inductive hypothesis. 
Hence assume that s'.act p / s.act p . By the code we have that s'.act p = v. Hence we have to 
prove that v G TotReg. By the precondition of n we have that rcvd-rgst[q,v.id] = true for all 
q G v.set. Then Invariant 4.9 implies that v G TotReg. 

Part 2 is preserved because info-sent[g] p is not modified. 

4. Other actions. 

Variables aci p , info-sent[g] p (as well as TotReg) are not modified. Hence the assertions cannot 
be made false. 

D 

The following invariant states that if process q is in a view which has been attempted by process 
p (which may or may not be q itself) then the current view of q is either v or a later one. 

Invariant 4.11 (dvs-impl) 

In any reachable state, if v G attempted p and q G v.set then cur.id q > v.id. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix p,v and suppose that v G attempted p and q G v.set. 
If p g' Po then attempted p = {}, a contradiction. On the other hand, if p G Po then since 
v G attempted p , it must be that v = vo. Moreover since q G v.set we have that q G Po- Hence 
cur q = vo, so cur.idq > v.id, as needed. 

For the inductive step assume the invariant is true in state s. We need to prove that it is true 
in s' for any possible step (s, n, s'). Fix p and v and assume that v € s' . attemptedp and q G v.set. 
We distinguish two cases. 

1. v G s. attemptedp. 

By the inductive hypothesis we have that s. cur.idq > v.id. By the monotonicity of cur. id we 
have that s'. cur.idq > s. cur.idq. 

2. v $ s. attemptedp. 

It must be 7r = dvs-newview(w) p . We consider two possible cases: q = p and q ^ p. 

Assume that q = p. Then Invariant 4.3 implies that s'. client- cur p > v.id. Since s'.cur.id p = 
s'. client- cur p , we have that s'.cur.idp > v.id, as needed. 

Assume that q / p. Then the precondition of n says that s.info-rcvd[q, v.id] / _L. By Invariant 
4.8 (used with p and q interchanged) we have that cur.idq > v.id, as needed. 



The following invariant states properties of views in the use set. 

Invariant 4.12 (dvs-impl) 
In any reachable state: 
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1. If cur p 7^ _L and w G use p , then w.id < cur.id p . 

2. If cur p ^ _L and client-cur p ^ cur p and w G use p , then w.id < cur.idp. 

3. If info-sent[g] p = {x, X) and w G {x} U X then w.id < g. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Consider Part 1 first. In the initial state we have that 
usep is either empty or contains only vo. In the former case Part 1 is vacuously true. In the 
latter case we have that w = vq and the invariant follows from the fact that go is the minimum 
element of Q. Parts 2 and 3 are vacuously true. Indeed in the initial state client-cur p = cur p and 
info-sent[g] p = _L. 

For the inductive step assume the invariant is true in state s. We need to prove that it is true 
in s' for any possible step (s, n, s'). Fix p, g, x, X and w. 

We prove that the invariant is still true in s' by considering each possible action n. 

1. 7T = VS-NEWVIEW(w) p 

First consider Part 1. Assume that s'.cur p / _L and w G s'.use p . Then w G s.use p . If 
s.cur p = _L, then, by Invariant 4.7, w = vo. Since VQ.id is the minimum element of Q, we have 
that w.id < s'. cur.idp. So assume that s.cur p ^ _L. In this case, by the inductive hypothesis, 
Part 1, we have that w.id < s. cur.idp, which implies w.id < s'. cur.idp. 

Hence Part 1 is still true in s'. Since we actually proved that w.id < s'. cur.idp also Part 2 is 
still true in s'. 

Now consider Part 3. Assume that s' .info-sent[g] p = (x,X) and w G {x}L)X. If g ^ w.id then 
we have that s.info-sent[g] p = (x,X). By the inductive hypothesis, Part 3, we have w.id < g, 
as needed. Hence assume g = v. id. By the code of tt, we have that s.use p = {x} U X. Now if 
s.cur p = _L, then by Invariant 4.7, w = vq. Since v^.id is the minimum element of Q, we have 
that w.id < v.id = g, as needed. So assume further that s.cur p ^ _L. In this case, the inductive 
hypothesis, Part 1, implies that w.id < s. cur.idp, which implies w.id < s'. cur.idp = v.id = g, 
as needed. 

2. 7T = dvs-newview(d) p 

Consider Part 1 first. The only possible new element added to use p is v. Since v = s'.cur.id, 
Part 1 still holds in s'. Part 2 is vacuously true, because s'. client- cur p = s'.cur p . Part 3 is 
preserved because info-sent[g] p is not modified. 

3. 7T = dvs-garbage-collect(«) p 

Consider Part 1. Assume that s' .cur p ^ _L and that w G s' .use p . By the code s'.cur p = s.cur p . 
If w G s.use p then by the inductive hypothesis Part 1 is true in s and thus it is still true in s'. 
Hence assume that w $ s.usep. By the code, this cannot happen because no view is added to 
use v . 

Part 2 can be proved in a similar way. Part 3 is preserved because info-sent[g] p is not modified. 

4. 7T = VS-GPRCv({"info",x,X)) qtP 

The proof is exactly as in the previous case. 
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5. Other actions. 

Variables use p , cur p , client-cur p and info-sent[g] p are not modified. Hence none of the asser- 
tions can be made false. 

D 

The following three invariants, say that certain views appear in use sets, or in "info" messages, 
unless they have been garbage-collected. 

Invariant 4.13 (dvs-impl) 

In any reachable state, if w G attempted p then either w G use p or w.id < act.id p . 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. Fix p,w and suppose that w G attempted p . If p g' Po then 
attempted p = {}, a contradiction. On the other hand, if p G Po then since w G attempted^ it 
must be that w = uo- But in this case also act p = vo, so vq G use p , as needed. 

For the inductive step assume the invariant is true in state s. We need to prove that it is true 
in s' for any possible step (s,tt,s'). So fix w and p such that w G s' '.attempted p . We distinguish 
two possible cases. 

1. w G s. attempted p . 

By the inductive hypothesis we have that either w G s.use p or w.id < s. act.idp. In the 
latter case, because of the monotonicity of act.idp, we have w.id < s'. act.idp. So assume that 
w G s.use p . If w G s'.use p we are done, so assume further that w g' s'.use p . Then it must be 
that either n = dvs-garbage-collect(«) p or n = vs-GPRCv(("info",x,X)) r , p for some r. In either 
case, the code implies that s'.act p > w.id. 

2. w G' s. attemptedp. 

It must be 7r = dvs-newview(«) p . By the code, view v is inserted into attemptedp, but also into 
ambp (and hence into use p ). Thus the invariant is still true in s'. 

U 

Invariant 4.14 (dvs-impl) 

In any reachable state, if info-rcvd[q, g] p = (x, X) and w G {x} U X , then either w G use p or 

w.id < act.idp. 

Proof: By induction on the length of an execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state info-rcvd[q, g] p = _L for any p, q, g. Hence 
the statement is vacuously true. 

For the inductive step assume the invariant is true in state s. We need to prove that it is true 
in s' for any possible step (s, n, s'). Fix p, q, g, x, X and w, and assume that s' ' .info-rev d[q, g] p = 
(x,X), and w G {x} UX. We consider two cases: 

1. s.info-rcvd[q,g] p = (x,X) 

By the statement applied to s, we obtain that either w G s.use p , or s.act.id p > w.id. In the 
latter case, s'.act.id p > w.id, because of monotonicity of act.idp,. So assume that w G s.use p . 
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If w G s'.use p then we are done, so assume further that w ^ s'.use p . (That is, w is garbage- 
collected.) 

Then it must be that either n =dvs-garbage-collect(«) p or n = vs-GPRCv({"info",x,X)) r , p for 
some r. In either case, the code implies that s'.act p > w.id. 

2. s.info-rcvd[q,g] p / (x,X) 

Then n = vs-GPRcv(("info",x,X)) q , p . If w G s'.use p then we are done. Hence assume that 
w G - s' .use p . By the code, we have that s'.act p > w.id (that is, w is garbage-collected). 

D 

Invariant 4.15 (dvs-impl) 

In any reachable state, if info-sent[g] p = (x,X), w G attempted p , and w.id < g, then either 

w G {x} (J X or w.id < x.id. 

Proof: By induction on the length of an execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, info-sent[g] p = _L for all g,p, so the 
statement is vacuously true. 

For the inductive step assume the invariant is true in state s. We need to prove that it is true in 
s' for any possible step (s, it, s'). Fix p, g, w, x, and X, and assume that s' .info-sent[g] p = (x, X), 
w G s'. attempted p , and w.id < g. We consider four cases: 

1. s.info-sent[g] p = (x,X) and w G s .attemptedp. 

Then the statement for s implies that either w G {x} U X or w.id < x.id. In either case the 
statement is true in s' also. 

2. s.info-sent[g] p ^ (x,X) and w ^ s. attemptedp. 

This cannot happen because both conditions cannot become true in a single step: the first only 
becomes true by means of a vs-newview(w) p , for some view v, while the second only becomes 
true by means of dvs-newview(to) p . 

3. s.info-sent[g] p ^ (x,X) and w G s. attemptedp. 

It must be 7r = vs-newview(«) p , for some v, x must be s.actp, and X must be s.amb p . 
Invariant 4.13 implies that either w G s.use p or w.id < s.act.id p . Now, s.use p = {s.actp} U 
s.ambp = {x} U X. So we have that either w G {x} U X or w.id < x.id, as needed. 

4. s.info-sent[g] p = (x,X) and w ^ s. attemptedp. 

Then n must be dvs-newview(w) p . We claim that this cannot happen: Since s.info-sent[g] p = 
(x,X), by Invariant 4.4 we have s.cur.idp > g. Since g > w.id, we have s.cur p > w.id. But 
the precondition of n requires that s.cur p = w.id. Hence n is not enabled in state s. 
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Invariant 4.16 says that two attempted views having no intervening totally registered view, 
and having a common member, q, that has attempted the first view, must intersect in a majority 
of processors. This is because, under these circumstances, information must flow from q to any 
processor that attempts the second view. 

Invariant 4.16 (dvs-impl) 

In any reachable state, suppose that v G attempted p , q G v. set, w G attempted q , w.id < v.id, and 

there is no x G TofReg such that w.id < x.id < v.id. Then \v.set C\w.set\ > \w.set\/2. 

Proof: By induction on the length of an execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, only vq is attempted, so the hypotheses 
cannot be satisfied. Thus, the statement is vacuously true. 

For the inductive step assume the invariant is true in state s. We need to prove that it is true in 
s' for any possible step (s, it, s'). Fix v, w, p, and q, and assume that v G s' .attemptedp, q G v. set, 
w G s'. attempted q , w.id < v.id, and there is no x G s 1 .TbtReg such that w.id < x.id < v.id. Then 
also there is no x G s.TbtReg such that w.id < x.id < v.id. We consider four cases: 

1. v G s.attemptedp and w G s. attempted q . 

Then the statement for s implies that \v.set n w.set\ > \w.set\/2, as needed. 

2. v ^ s.attempted p and w ^ s. attempted q . 

This cannot happen because we cannot have both v and w becoming attempted in a single 
step. 

3. v ^ s.attempted p and w G s. attempted q . 

Then n must be dvs-newview(w) p . Since q G v. set, by the precondition of n we have that 
s.info-rcvd[q, v.id] p = (x, X) for some x and X. Then Invariant 4.8 implies that s.info-sent[v.id] q 
(x, X). Then (since w.id < v.id), Invariant 4.15 implies that either w € {rr}UX or w.id < x.id. 
If w.id < x.id, then we obtain a contradiction. Indeed by Invariant 4.10 x € s.TotReg and by 
Invariant 4.12, Part 3 (used with w = x) we have x.id < v.id. This contradicts the hypothesis. 
So w G {x} U X. 

Now by Invariant 4.14 we have that either w € s.use p or w.id < s.act.id p . In the former case, 
by the precondition of n, we have \v.set n w.set\ > \w.set\/2. In the latter case, we obtain a 
contradiction. Indeed by Invariant 4.10 we have s.act p € TbtReg. Moreover by the precondi- 
tion of 7r, s.cur p cannot be _L and s.cur p > s. client- cur p and, by definition, s.act p € s.use p . 
Hence by Invariant 4.12, Part 2, we have s.act.idp < s.cur p = v.id. 

4. v £ s.attemptedp and w ^ s. attempted q . 

Then 7r must be Dvs-NEwviEw(w)q. But this cannot happen. Indeed since v G s.attemptedp 
and g G u.sei, Invariant 4.11 implies that s.cur.id q > v.id. Since v.id > w.id, we have 
s.cur.id q > w.id. But the precondition of action n requires s.cur.id q = w.id, so n is not 
enabled in s. 

U 
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Invariant 4.17 says that any attempted view v intersects the latest preceding totally registered 
view wina majority of members of w. 

Invariant 4.17 (dvs-impl) 

In any reachable state, suppose that v G Att, and w G TotReg, w.id < v. id, and there is no 

x G TotReg such that w.id < x.id < v. id. Then \v.set n w.set\ > \w.set\/2. 

Proof: By induction on the length of an execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, only vq is attempted, so the hypotheses 
cannot be satisfied. Thus, the statement is vacuously true. 

For the inductive step assume the invariant is true in state s. We need to prove that it is 
true in s' for any possible step (s, n, s'). Fix v and w, and assume that v G s'.Att, w G s 1 .TotReg, 
w.id < v.id, and there is no x G s' .TotReg such that w.id < x.id < v. id. We consider four cases: 

1. v G s.Att and w G s.TotReg. 

Then, from the inductive hypothesis we have \v.set D w.set\ > \w.set\/2. 

2. v ^ s.Att and w ^ s.TotReg. 

This cannot happen because we cannot have both v becoming attempted and w becoming 
totally registered in a single step. 

3. v ^ s.Att and w G s.TotReg. 

Then n must be dvs-newview(w) p for some p. The precondition of n implies that, for any view 
y G s.use p , \v.set n y.set\ > \y.set\/2. Hence to prove the claim it is enough to prove that 
w G s.use p . We proceed by contradiction assuming that w ^ s.usep. 

By Invariant 4.10, Part 3, s.use p D s.TotReg ^ {}. Let m be the view in s.use p n s.TbtReg 
having the biggest identifier. We know that m / w because w ^ s.usep. Also, m / v, because 
m G s.TbtReg and v ^ s.TbtReg. It follows that m.id / v.id. 

We claim that m.id < w.id. We have already shown that m.id / w.id. Suppose for the sake 
of contradiction that m.id > w.id. From the precondition of action n we have that s.cur = v 
and hence s.cur ^ _L. Also from the precondition of n we have that s. client-cur p < s.cur p . 
Since m G s.use p , Invariant 4.12, Part 2, implies that m.id < s.cur.id p and since s.cur = v we 
have we have m.id < v.id. So w.id < m.id < v.id. Since m G s' .TofReg, this contradicts the 
hypothesis of the inductive step. Therefore, m.id < w.id. 

Let n be the view in s.TotReg that has the smallest id strictly greater than that of m. 
Remember that w G s' ' .TbtReg and since n =dvs-newview(«) p we have that w G s.TbtT&g; 
thus n exists and it holds m.id < n.id < w.id < v.id. Since m G s.use p , the precondi- 
tion of 7r implies that \v.set n m.set\ > \m.set\/2. By the statement applied to state s, 
\n.set n m.set\ > \m.set\/2. Hence there exists a processor q G v. set D n. set. By the pre- 
condition of 7r, s.info-rcvd[q, v.id] p = (x,X) for some a;,X. Then Invariant 4.8 implies that 
s.info-sent[v.id] q = (x,X). Then Invariant 4.12, Part 3 (used with w = x), implies that 
x.id < v.id. Since n G s.TotReg, we have that n G s.attempted q . Then Invariant 4.15 (used 
with w = n) implies that either n G {x} U X or n.id < x.id. In either case, {x} U X contains 
a view y G s.TotReg (either n or x) such that n.id < y.id < v.id. Then Invariant 4.14 implies 
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that either y E s.use p or y.id < s.act.id p . By Invariant 4.10, Part 1, s.act p E s.TotReg and by 
definition, s.act p E s.use p . So in either case, the hypothesis that m is the totally registered 
view with the largest id belonging to s.use p is contradicted. 

4. v E s.Att and w ^ s.TotReg. 

Then 7r must be dvs-register p for some p. Let m be the view in s.TotReg with the largest id 
that is strictly less than w.id. By the statement for s, we know that \w.setDm.set\ > \m.set\/2 
and \v.setr\m.set\ > \m.set\/2. Hence there is a processor q E w.set D v.set. 

Since v E ,s..4i£, there exists a processor r such that v E s. attempted,.. Thus also u € 
s'. attempted,.. Since io € s 1 .TotReg, we have that io € s 1 .attempted q . By assumption, there 
is no view a; E s' .TotReg such that to. id < a;. id < v. id. By Invariant 4.16 applied to state s' 
(with p = r), we have that \v.setC\w.set\ > \w.set\/2, as needed. 

D 

The final invariant, a corollary to Invariant 4.17, is instrumental in proving that DVS-IMPL 
implements DVS. 

Invariant 4.18 (dvs-impl) 

In any reachable state, ifv,w E Att, w.id < v. id, and there is no x E TotReg with w.id < x.id < 

v.id, then v. set n w.set / {}. 

Proof: Suppose that v and w are as given. We consider two cases. 

1. w E TotReg. 

Since there is no x E TotReg, Invariant 4.17 implies that \v.set n w.set\ > \w.set\/2, which 
implies that v.set D w.set / {}, as needed. 

2. io ^ TotReg. 

Then let Y = {y\y E TotReg, y.id < w.id}. We first show that Y is nonempty: Invariant 4.5 
implies that vq E TotReg and that uo-^ < w.id. If v$.id = w.id, then by Invariant 4.1, we have 
w = v o- But then w E TotReg, a contradiction to the definition of this case. So we must have 
VQ.id < w.id, which implies that vq E Y, so Y is nonempty. 

Now fix z to be the view in Y with the largest id. We have that there is no x E TotReg 
with z. id < x.id < v.id. Then Invariant 4.17 implies that \w.set D 2.se£| > \z.set\/2 and 
|u.se£ n z.sei| > \z.set\/2. Together, these two facts imply that v.set n w.sei / {}, as needed. 



D 



4.3.2 The abstraction function 



We prove that dvs-impl implements DVS by defining a function T that maps states of dvs-impl 
to states of DVS and proving that this function is a abstraction function. Section 4.3.2.1 contains 
the definition of the function T along with auxiliary invariants, then Section 4.3.2.2 gives the 
proof that T is an abstraction function. 
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4-3.2.1 Definition of T 

dvs-impl uses VS to send client messages and messages generated by the implementation ( "info" 
and "registered" messages). The abstraction function discards the non-client messages. Thus, if q 
is a finite sequence of client and non-client messages, we define purge(q) to be the queue obtained 
by deleting any "info" or "registered" messages from q, and purgesize(q) to be the number of 
"info" and "registered" messages in q. Figure 5 defines the abstraction function T . 



Let s be a state of dvs-impl. The state t = ^(s) of dvs is the following. 

• t. created = Upfz-ps.attemptedp 

• for each p € V, t.current-viewid[p] = s. client- cur. id p 

• for each g € G, t.attempted[g] = {p\g = v.id,v € s.attempted p } 

• for each g € G, t.registered[g] = {p\s.reg[g] p } 

• for each p G V, g € G, t.pending[p,g] = purge(s.pending[p, g])+purge(s.msgs-to-vs[g] p ) 

• for each g € G, t.queue[g] = purge(s.queue[g]) 

• for each p € V, g £ Q, 

t.next[p, g] = s.next[p, g] — purgesize(s.queue[g](l..next[p, g] — 1)) — \s.msgs-from-vs[g] p \ 

• for each p € V, g € G, 
t.next-safe[p,g] = 

s.next-safe[p,g] — purgesize(s.queue[g](l..next-safe[p, g] — 1)) — \s.safe-from-vs\g\ 



Figure 5: The abstraction function T . 

Next we give some simple consequences of the definition of T . They deal with the messages 
delivered by dvs-impl. They state that these messages are exactly the ones that dvs would 
deliver to the client. 

Invariant 4.19 (dvs-impl) 

In any reachable state s, if s.msgs-from-vs[g] p = ((mi, q\), (ni2, #2)5 ■■-, (^/i, Qk))> then we have 

that J r (s).queue[g](next[p,g]..next[p,g] + k - 1) = ((mi, qi), (m 2 , 92), ■■■, {m k , qt))- 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state no message is in msgs-from-vs[g] p . Hence 
the invariant is vacuously true. 

For the inductive step, assume that the invariant is true in state s. We need to prove that it is 
true in state s' for any possible step (s,ir,s'). Fix p, g and mi, qi,m2, q2, ■■■,Tnk, qt and assume 
that s' .msgs-from-vs[g] p = ((mi, q\), (7712, (72), ••-, (m,&, <&))• We distinguish the following cases. 

1. s.msgs-from-vs[g] p = {{mi,qi),...,{m k -i,qk-i))- 

It must be n =vs-GPRCv(m k ) qh ,p. By the inductive hypothesis we have that 
J r (s).queue[g]{next[p, g]..next[p, g] + k - 2) = ((mi, qi), ..., (m k -i, qk-i))- 
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By the code in vs we have that next[p, g] is increased by one and by the code in dvs we 
have that the size of msgs-from-vs[g] p also increases by one. Hence by the definition of F, we 
have that T(s').next[p,g\ = J r (s).next[p, g]. Moreover T(s').queue[g\ = T(s).queue[g\ and by 
the precondition of n we have that J r (s).queue[g](s.next[p, g] + k — 1) = {m k ,q k }. Thus the 
invariant is still true in s'. 

2. s.msgs-from-vs[g] p = ((m, q), (mi, qi), (m 2 , <? 2 ), ■ ■■, (m k , q k )). 

Then n =Dvs-GPRCv(m)q, p . By the inductive hypothesis we have that 
J r {s).queue[g]{next[p,g]..next[p,g] + k) = ((m, q), (mi, qi), (m 2 , q 2 ), ■ ■■, {m k -i, qk-i))- 

By the code we have that next[p, g] is incremented by one. Since T{s').queue[g\ = J r (s).queue[g], 
the invariant is still true in s'. 

3. s.msgs-from-vs[g] p = s' .msgs-from-vs[g] p 

By the inductive hypothesis the assertion is true in state s. For any possible action in this case 
T{s').next[p, g] = J r (s).next[p, g] and the portion of T(s).queue[g\ involved in the statement 
of the invariant never changes because messages are only appended to queue[g]. Thus the 
assertion cannot be made false. 

4. Other cases. 

Not possible. Indeed msgs-from-vs[g] p either stay the same or is changed by appending a 
message or deleting the head. 

D 

The following invariant follows easily from the previous one. It just states that the next message 
delivered by dvs-impl to a processor p is the same one that dvs delivers. 

Invariant 4.20 (dvs-impl) 

In any reachable state s, if (m,q) is head of s.msgs-from-vs[g] p , then J r (s).queue[g](next[p, g]) = 

(m,q). 

Proof: Follows easily from previous one. Q 

Similar invariants hold for the delivery of safe messages. 

Invariant 4.21 (dvs-impl) 

In any reachable state s, we have that if s.safe-from-vs[g] p = ((mi,gi), (7712,(72), ••-, {nik->qk))> then 

T{s).queue[g]{next-safe[p, g], next-safe[p, g] + k - 1) = ((mi,qi),(m2,q2),-,{ m k,Qk))- 

Proof: The proof is as for msgs except that it uses the safe-from-vs queue instead of msgs-from-vs 

and the pointer next-safe instead of next. D 

Invariant 4.22 (dvs-impl) 

In any reachable state s, if(m,q) is head of s.safe-from-vs[g] p , then J r (s).queue[g](next-safe[p,g]) = 

(m,q). 

Proof: Follows easily from previous one. Q 

Notice that v is totally registered in state s of DVS-IMPL if and only if it is totally registered in 
the state of dvs that appears in state ^(s) of dvs. 
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4-3.2.2 Proof that T is an abstraction function 

In order to prove that T is an abstraction function we need to prove that (a) for any initial state 
s of dvs-impl we have that ^(s) is an initial state of dvs, and that (b) for any possible step n 
of dvs-impl there exists an execution fragment a of dvs such that the trace of a is equal to the 
trace of n, that is, a and n have identical externally observable behaviors. Lemmas 4.23 and 4.24 
prove this. 

Lemma 4.23 If s is an initial state o/dvs-impl then ^(s) is an initial state o/dvs. 

Proof: Let so be the unique initial state of dvs-impl and to the unique initial state of dvs. 

We have s^. attempted^ = {t>o} forp € Po and s$. attempted^ = {} for p $ Pq. By the definition 
of T and the fact that Po / {} (because all membership sets are defined to be nonempty), we 
have J 7 (so). created = {i>o}- This is as in to- 

We have sq. client- cur p = {t>o} for p € Po and so -client- cur p = _L for p $ Pq. By the definition 
of T we have T (so)- current- viewid[p] = go for p € Po and ? '( s o)- current- viewid[p] = _L for p $ Po. 
This is as in to- 

We have so-attempted p = {vo} forp € Po and so-attempted p = {} ioi p $ Po- By the definition 
of T we have T(so).attempted[go\ = Po and J r (so).attempted[g] = {} for g / go- This is as in io- 

Let g € Q. We have that so-reg[g] p is true if and only if p € Po and g = go- By the definition 
of T we have !F(so).registered[go\ = Po and T(so) -registered^} = {} for g / go, as in io- 

Let p G V- We have that so-msgs-to-vs[g] p = A and so-pending[p, g] = A. By the definition of 
T we have J r (so)-pending[p, g] = A, as in to- 
Let g £ Q. We have so-queue[g] = A. By the definition of T we have T(so)-queue[g\ = A, 
as in to- 

Letp E V,g E Q. We haveso-nea;t[p, g] = 1, purgesize(s.vs.queue[g]) = and so-msgs-from-vs[g\ 
A. By the definition of T we have J r (so)-next[p, g] = 1, as in to- A similar argument holds for 
next-safe. 

Thus ^"(so) = to, as needed. Q 

Lemma 4.24 Lets be a reachable state o/dvs-impl, ^(s) a reachable state o/dvs, and (s,7r,s') 
a step o/dvs-impl. Then there is an execution fragment a o/dvs that goes from ^(s) to T(s'), 
such that trace(a) = trace(n). 

Proof: By case analysis based on the type of the action n. (The only interesting case is where 

■k = dvs-newview(«) p .) Define t = JF(s) and t' = T(s'). 

1. 7T = VS-CREATEVIEW(w) 

Then trace((s, n, s')) = A. Action n modifies created. The definition of T is not sensitive to 
this change. Therefore, t = t' , and we set a = t. 

2. 7T = VS-NEWVIEW(w) p 

Then trace((s, 7r, s')) = A. Action n modifies cur p , info-sent[cur.id] p , and current- viewid[p], 
and adds an "info" message to msgs-to-vs[cur.id] p . The definition of J 7 is not sensitive to any 
of these changes. Therefore, t = t', and we set a = t. 
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3. 7T = VS-GPSND(m) p 

Then trace((s, n, s')) = A. Action n just moves a message from the queue msgs-to-vs[cur.id] p 
to the queue pending[p, current-viewid[p]]. The definition of T is not sensitive to this change. 
Therefore, t = t', and we set a = t. 

4. 7T = VS-ORDER,(m,p, g) 

Then trace((s,ir,s')) = A. Action n moves a message from pending[p, g] to queue[g]. We 
consider two cases. 

(a) m € M c 

Then we set a = (i,Dvs-ORDER(m,p, g),t'). We claim that Dvs-ORDER(m,p, g) is enabled in t: 
Since vs-0RDER(m,p,#) is enabled in s, it follows that m is the head of s.pending[p, g]. By 
the definition of J 7 , m is also the head of t.pending[p, g]. It follows that Dvs-oRDER(m,p, g) 
is enabled in t. 

By definition of T , t' differs from t only in the fact that m is moved from pending[p, g] to 
queue[g]. This is the effect achieved by applying Dvs-oRDER(m,p, g) to t. 

(b) m ^ .M c 

Then the definition of T is not sensitive to this change. Therefore, t = £', and we set 

a = t. 

5. 7T = VS-GPRCV(("info", V,s)) g ,p 

Then trace ((s,n,s')) = A. This action can modify info-rcvd[cur.id p ,q] p , act p and amb p (see 
code of dvs) and causes next[p, cur.id p ] to be incremented (see code of vs). The definition of J 7 
is not sensitive to these changes. (The only interesting case is the definition of t.next[p, cur.id p ], 
where the absolute values of the first two terms on the right-hand side are both increased by 
1, but they cancel each other out.) Therefore, t = t' , and we set a = t. 

6. 7T = VS-GPRCV( "registered") p 

Then trace((s, n, s') = A. This action can modify rcvd-rgst[cur.id, q] p . It also causes the pointer 
next[p, cur.id p ] to be incremented. The definition of T is not sensitive to these changes. (The 
only interesting case is the definition of t.next[p, cur.idp], where the absolute values of the first 
two terms on the right-hand side are both increased by 1, but they cancel each other out.) 
Therefore, t = £', and we set a = t. 

7. 7T = VS-GPRCV(m) p , m € M c 

Then trace((s, n, s')) = A. This action copies a message from the sequence queue[cur.id] p to 
the sequence msgs-from-vs[p, client- cur[p]], and causes next[p, cur.idp] to be incremented. The 
definition of T is not sensitive to these changes. (The only interesting case is the definition of 
t.next[p, cur.idp], where the absolute values of the first and third terms on the right-hand side 
are both increased by 1, but they cancel each other out.) Therefore, t = t', and we set a = t. 

8. 7r = vs-sAFE((m, v, s))<7, p , m € { "info", "registered"} 
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Then trace((s,n,s')) = A. Action n just causes next-safe[p, cur.id p ] to be incremented. The 
definition of T is not sensitive to this change. (The only interesting case is the definition of 
t.next-safe[p, cur.idp], where the absolute values of the first two terms on the right-hand side 
are both increased by 1, but they cancel each other out.) Therefore, t = t', and we set a = t. 

9. 7T = VS-SAFE(m) p , m G M c 

Then trace {{s, it, s')) = A. Action n adds a message to safe-from-vs[cur.id] p and causes the 
pointer next-safe[p, cur.idp] to be incremented. The definition of T is not sensitive to these 
changes. (The only interesting case is the definition of t.next-safe[p, cur.idp], where the abso- 
lute values of the first and third terms on the right-hand side are both increased by 1, but 
they cancel each other out.) Therefore, t = t' , and we set a = t. 

10. 7T = DVS-NEWVIEW(w) p 

Then trace((s,n,s)) = n. In dvs-impl, this action modifies only variables amb p , attemptedp, 
client-cur p . We have s' '.client- cur p = v and s'. attemptedp = s. attemptedp U {v}. By definition 
of J 7 , we have that t' . current- viewid[p] = s' .client- cur.idp = v. id, t'. created = t.createdU {v} 
and t' . attempted[v .id] = t.attempted[v.id] U {p}, while all other state variables in t' are as in t. 

We consider two cases: 

(a) v G t. created. 

In this case, we set a = (t,n',t'), where it' = dvs-newview(«) p . The code shows that tt' 
brings dvs from state t to state t'. It remains to prove that tt' is enabled in state t, that 
is, that v G t. created and v.id > t.current-viewid[p]. The first of these two conditions is 
true because of the defining condition for this case. The second condition follows from 
the precondition of n in DVS-IMPL: this precondition implies that v.id > s. client- cur.idp, 
and by the definition of J 7 we have t.current-viewid[p] = s. client- cur.idp. 

(b) v G" t. created. 

In this case we set a = (t, n' , t" , n" , t'), where n' = dvs-createview(w) p , tt" = dvs-newview(w) p , 
and t" is the unique state that arises by running the effect of n' from t. The code shows 
that a brings dvs from state t to state t'. It remains to prove that n' is enabled in t and 
that n" is enabled in t". 

The precondition of 7r' requires that (i) Mw G t. created, v.id ^ w.id and (ii) V-u; G t. created, 
either Eb G s.TotReg satisfying w.id < x.id < v.id or v.id < x.id < w.id, or else v. set n 
w.set ^ {}. 

To see requirement (i), suppose for the sake of contradiction that w G t. created and 
w.id = v.id. The precondition of n in DVS-IMPL implies that v = s.cur p , which implies 
that v G s. created. Since w G t. created, the definition of T implies that w G s.attempted q 
for some q. This implies that w G s. created. But then Invariant 4.1 implies that v = w. 
But this contradicts that fact that v ^ t. created and w G t. created. 

To see requirement (ii), suppose that w G t. created and there is no x G s.TofReg satisfy- 
ing w.id < x.id < v.id or v.id < x.id < w.id. Since w G t. created, by definition of T, 
w G s.attempted q for some q. Clearly, w G s 1 .attempted q . Therefore, w G s'.Att. By the 
code of 7r we have that v G s' .attemptedp. Therefore we also have v G s'.Att. Moreover, 
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there is no x € s 1 .TofReg satisfying w.id < x.id < v.id or v.id < x.id < w.id. Then 
Invariant 4.18 implies that v.setn w.set ^ {}, as needed to prove that n' is enabled in t. 

We now prove that n" is enabled in state t" . The precondition of n" requires that 
v € t". created and v.id > t" .current-viewid[p]. The first condition is true because v 
is added to created by n'. The second condition follows from the precondition of -k in 
dvs-impl: The precondition of n implies that v.id > s. client- cur.id p . The definition of 
T implies that t.current-viewid[p] = s. client- cur. id p . Moreover, t" .current-viewid[p] = 
t. current- viewid[p]. It follows that v.id > t" .current- viewid[p]. Thus n" is enabled in state 
t". 

11. 7T =DVS-REGISTER P 

Then trace((s, n, s')) = n. Let g be s. client- cur. idp, which equals t. current- viewid[p] by the 
abstraction function. If g = _L, then n has no effect in DVS-IMPL, so s = s'; thus t = £', as 
required to show that n brings DVS from t to t'. Otherwise, g / _L, so by the code in DVS- 
IMPL, this action sets reg[g] p to true and inserts a "registered" message into msgs-to-vs[g] p . 
By definition of T ', t' is the same as t except that t' .registered[g] = t. register 'ed[g] U {p}. We 
set a = (i, Dvs-REGisTER p ,i'). It is easy to check that dvs-register p brings DVS from t to t'. 

12. 7T =dvs-garbagecollect(w) p 

Then trace((s,n,s')) = A. This action can modify act p and amb p . The definition of T is not 
sensitive to these changes. Therefore, t = £', and we set a = t. 

13. 7T =DVS-GPSND(m) p 

Then trace((s, n, s')) = it. We set a = (£, Dvs-GPSND(m) p ,i'). We consider two cases: 

(a) s. client- cur. id = _L 

Then s = s'. In this case, the definition of T implies that also t. current- viewid[p] = _L, 
which implies that the action also has no effect in i, which suffices. 

(b) s. client- cur. id ^ _L 

In this case, the action appends m to msgs-to-vs[g] p , where g = client- cur. id p . Hence 
we have that s' .msgs-to-vs[g] = s.msgs-to-vs[g]+m. By the definition of T we get that 
t'.pending[p,g] = t.pending[p, g]+m. This is the effect of the action in t (using the fact 
that t.current-viewid[p] / _L.) 

14. 7T = DVS-GPRCV(m) p 

Then trace((s,n,s')) = n. This action removes the head of msgs-from-vs[g] p , where g = 
cur.id p . We have that s.msgs-from-vs[g] p = m+s' .msgs-from-vs[g] p . Thus t'.next[p, g] = 
t.next[p, g] + 1. We set a = (£, Dvs-GPRcv(m) p , t'). It is easy to check that the step has the 
required effect in DVS. The fact that Dvs-GPRcv(m) p is enabled in t follows from Invariant 4.20. 

15. 7T = DVS-SAFE(m) p 

Then trace(ir) = it. This action removes the head of the safe-from-vs[g] p , where g = cur.idp. 
We have that s.safe-from-vs[g] p = m+s' .safe-from-vs[g] p . Thus t'.next-safe[p, g] = t.next-safe[p,g]+ 
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1. We set a = (t, Dvs-GPRcv(m) p , £'). It is easy to check that the step has the required effect in 
DVS. The fact that Dvs-GPRCv(m) p is enabled in t follows from Invariant 4.22. 

D 

Lemmas 4.23 and 4.24 prove that T is an abstraction function from dvs-impl to DVS and thus 
the following theorem holds (this is a standard inference, cf. [29]). 

Theorem 4.25 Every trace o/dvs-impl is a trace o/dvs. 

5 An application of dvs 

Now we demonstrate the utility of DVS by showing how to use it to implement a totally ordered 
broadcast service, called TO, originally defined in [17]. This service accepts messages from clients 
and delivers them to all clients according to the same total order. This kind of service is used as a 
building block for many fault-tolerant distributed applications, e.g., in implementing sequentially- 
consistent shared memory and atomic shared memory. The TO specification is reproduced in 
Figure 6. 



Signature: 

Input: BCAST(a) p , a 6 A, p 6 V 

Internal: TO-ORDER(a,p), a € A, p E V 

State: 

queue € seqof(A x V), init A 



Output: BRCV(a) p ,q, a 6 A, p,q 6 V 

for each p € V : pending [p] € seqof(A), init A 
next\p] 6 N >0 , init 1 



Transitions: 

input BCAST(a) p 
Eff: append a to pending [p] 

internal TO-ORDER(o,p) 
Pre: a is head of pending [p] 
Eff: remove head of pending [p] 
append (a,p) to queue 



output BRCV(a) p ,q 

Pre: queue(next[q]) = (a,p) 
Eff: next[q] := next[q] + 1 



Figure 6: The TO service 

5.1 The implementation to-impl 

We provide an implementation of TO using DVS as a building block. The implementation is similar 
to the TO implementation provided in [17]. Both algorithms rely on primary views to establish a 
total order of client messages. The difference is that the algorithm in [17] uses a static notion of 
primary and our new algorithm uses a dynamic notion. The algorithm of [17] is built upon the 
VS service, defined in the same paper, that reports non-primary as well as primary views; that 
algorithm uses a simple local test to determine if the view is primary, namely it checks whether the 
view contains a majority of the processes; the algorithm also does some non-critical background 
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work (gossiping information) in non-primary views. In contrast, the algorithm we present here is 
built upon the DVS service, which only reports primary views. Thus the new algorithm is simpler 
in that it does not perform the local tests and does not carry out any processing in non-primary 
views. On the other hand, in the new algorithm, the application programs must perform dvs- 
register actions to tell the DVS service that they have obtained whatever information they need 
to proceed with regular computation in the new view. The corresponding notion in [17] is that of 
an established view: an established view in the algorithm of [17] corresponds to a registered view 
in our algorithm. Although the new algorithm appears very similar to the one of [17], the fact 
that the DVS service provides more complicated guarantees than the VS service makes the new 
algorithm harder to prove correct. 

The to-impl algorithm involves normal and recovery activity. Normal activity occurs while 
a group view is not changing. Recovery activity begins when a new primary view is presented 
by DVS, and continues while the members combine information from their previous history, to 
provide a consistent basis for ongoing normal activity. 

During normal activity, each client message received by TO-IMPL is given a system-wide unique 
label, which consists of a view identifier (the one of the view in which the message is received), a 
sequence number and the identifier of the process receiving the message. The association between 
client messages and their unique labels is recorded in a relation content and communicated to 
other processes in the same view using DVS. When a message is received, the label is given an 
order, a tentative position in the system- wide total order the service is to provide. When client 
messages have been reported as delivered to all the members of the view, by the "safe" notification 
of DVS, the label and its order may become confirmed. The messages associated with confirmed 
labels may be released to the clients in the given order. 

The consistent sequence of message delivery within each view keeps this tentative order con- 
sistent at members of a given view, but it may be not consistent between processes in different 
views. To avoid inconsistencies processes need state exchange at the beginning of a new view. 

When a new primary view is reported by DVS, recovery activity occurs to integrate the knowl- 
edge of different members. First, each member of a new view sends a message, using DVS, that 
contains a summary of that node's state. The summary of a node's state contains the following 
information: the association of labels with client messages, stored in content, the order of client 
messages to be reported to the clients, stored in order, a pointer to the next client message to 
be confirmed, stored in nextconfirm and the view identifier of the primary view with the highest 
view identifier in which the order sequence has been modified (stored in highprimary) . 

Once a node has received all members' state summaries, it processes the information in one 
atomic step, i.e., it registers the new view using the dvs-register action. The node processes 
state information as follows: it defines its confirmed labels to be the longest prefix of confirmed 
labels known in any of the summaries; it determines the representatives as the members whose 
summary include the greatest highprimary value; adopts as its new order the order of a "chosen" 
representative (the chosen representative is arbitrary but must be the same for all processes) 
extended with all other labels appearing in any of the received summaries, arranged in label 
order. 

Then recovery continues by collecting the DVS safe indications. Once the state exchange is 
safe, all labels used in the exchange are marked as safe and all associated messages are confirmed 
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Signature: 

Input: BCAST(a) p , a £ A 

DVS-GPRCv(m)q, p , q€V, m€ CDS 
DVS-SAFE(m)q, p , q eV, m ecus 

DVS-NEWVIEW(w) p , V 6 V 
State: 

current € Vj_, init «o if p € Po, -L else 
status € {normal, send, collect, established}, 

init normal 
content € 2 C , init {} 
nextseqno G N >0 , init 1 
buffer € seqof(C), init A 
safe-labels € 2 , init {} 



Output: DVS-REGISTERp 

DVS-GPSND(m)p, m 6 C U 5 
BRCv(a)q,p, a 6 A, q G P 

Internal: C0NFIRM„ 



order € seqof(C), init A 

nextconfirm € N >0 , init 1 

nextreport € N >0 , init 1 

highprimary € 5, init go if p € Po, -L else 

gotstate, a partial function from P to 5, init {) 

safe-exch C P, init {} 

registered C (J, init {go} if p € Po, {} else 

delay € seqof(A), init A 



Figure 7: dvs-to-to p , signature and states 

just as in normal processing. 

For the code, shown in Figures 7 and 8, we need the following definitions. C = Q x N >0 x V 
is the set of labels, with selectors Lid, l.seqno and I. origin. A is the set of messages that can be 
sent by the clients of the TO service. C = C x A is the set of possible associations between labels 
and client messages. S = 2 C x seqof(C) x N >0 x Q is the set of summaries, with selectors x.con, 
x.ord, x.next and x.high. Given x & S, x. confirm is the prefix of x.ord such that \x.confirm\ = 
min(x.next — l,\x.ord\). If Y is a partial function from processor ids to summaries, then we 
define: 

• knowncontent(Y) = ^ q& d om (Y)Y{q)-con, 

• maxprimary(Y) = va&-x. q( z ( j j0m (Y){Y(q).high}, 

• maxnextconfirm(Y) = max gedom (y) Y(q).next, 

• reps(Y) = {q € dom(Y) : Y(q).high = maxprimary} , 

• chosenrep(Y) = some element in reps(Y), 

• shortorder (Y) = Y(chosenrep(Y)).ord, and 

• fullorder(Y) = shortorder (Y) followed by the remaining elements of dom(knowncontent(Y)) , 
in label order. 

We define the system to-impl to be the composition of the automata dvs-to-to p , for each 
p € V , and the dvs specification, with all the external actions of dvs hidden. 

Following the approach in [17], we define the derived variables allstate, allcontent and allconfirm 
for to-impl as follows. 

• We write allstate[p,g] to denote a set of summaries, defined so that x € allstate[p,g] if and 
only if at least one of the following hold: 
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Transitions: 



input BCAST(a) p 
Eff: append a to delay 

internal LABEL(a) p 
Pre: a is head of delay 

current ^ _L 
Eff: let / be {current. id, nextseqno,p) 

content := content U {(I, a)} 

append / to buffer 

nextseqno := nextseqno + 1 

delete head of delay 

output dvs-gpsnd((/, a)) p 
Pre: status = normal 

I is head of buffer 

(I, a) € content 
Eff: delete head of buffer 

input DVS-GPRCV((/, a)) g , p 
Eff: content := content U {(I, a)} 
order := order+l 

input dvs-SAFe((/, a)) q , p 
Eff: safe-labels := safe-labels U {/} 

internal CONFIRMp 
Pre: order (nextconfirm) € safe-labels 
Eff: nextconfirm := nextconfirm + 1 

output BRCV(a)q, p 

Pre: nextreport < nextconfirm 

(order(nextreport),a) € content 
q = order (nextreport). origin 

Eff: nextreport := nextreport + 1 



input DVS-NEWVIEW(w) p 
Eff: current := w 
nextseqno := 1 
buffer := A 
gotstate := {} 
safe-exch := {} 
safe-labels := {} 
= send 



output DVS-GPSND(:r)p 
Pre: status = send 

x = (content, order, nextconfirm , highprimary) 
Eff: status := collect 



input DVS-GPRCV(a:)q,p 
Eff: content := content U a;, con 
gotstate := gotstate ffi (g, a;) 
if (dom(gotstate) = current. set) 
status := established 



A(status = collect) then 



output DVS-REGISTERp 

Pre: status = established 

current. id $ registered 
Eff: registered := registered U {current. id} 

nextconfirm := maxnextconfirm(gotstate) 

order := fullorder (gotstate) 

highprimary := current. id 

status := normal 

input DVS-SAFE(a:)q j p 
Eff: safe-exch := safe-exch U {g} 
if safe-exch = current. set then 
safe-labels := safe-labels U range(fullorder (gotstate)) 



Figure 8: DVS-TO-TO p , transitions 



1. current. id p = g and rr = 

2. a; € pending[p,g]. 

3. (a;,p) € gueuefg]. 

4. For some g, current. id q 



(content p , order p , nextconfirm p , highprimary p ) . 



g and rr = gotstate (p) q . 

Thus, aZZsfafe[p, g] consists of all the summary information that is in the state of p if p's 
current view is g, plus all the summary information that has been sent out by p in state 
exchange messages in view g and is now remembered elsewhere among the state components 
of TO-IMPL. Notice that allstate[p, g] consists only of summaries: an ordinary message {I, a) 
is never an element of allstate[p, g]. We write allstate[g] to denote \J peP allstate[p,g], and 
allstate to denote U 9 eG o,llstate[g]. 
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• We write allcontent for \J xea llstate x - con 

This represents all the information available anywhere that links a label with a corresponding 
data value. 

• We write allconfirm for lub xea u s t a t e (x. confirm) . 

For every p € P, g € G, buildorder[p,g] is defined to be a sequence of labels, initially empty; 
this variable is maintained by following every statement of processor p that assigns to order 
with another statement buildorder[p, current. id p ] := order. It follows that if p registers a 
view with id g, and later leaves view g for a view with a higher view identifier, then forever 
afterwards, buildorder[p,g] remembers the value of order p at the point where p left view g. 

5.2 Correctness proof 

The correctness proof for to-impl follows the approach of the proof in [17]. The pattern of 
reasoning is the same as the one used in that proof, however there are differences due to the 
distinct guarantees offered by dvs compared to those of VS. In particular Invariant 5.6, which 
corresponds to Lemma 6.18 of [17], requires a more subtle proof. 

We start by providing some auxiliary invariants. 

Invariant 5.1 (to-impl) 

In any reachable state, if I € domain (allcontent) and I. origin = p thenl < (current. id p , nextseqno p ,p). 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, no label is associated with any message 
hence the invariant is vacuously true. 

For the inductive step, assume that the invariant is true in a reachable state s. We need 
to prove that it is true in s' for any possible step (s,7r,s'). The only step that can make the 
assertion false is the step when a new label is associated with a message from a client, hence we 
only need to consider n = LABEL(a) p . The code of n shows that the new label is less than the 
new (current. id p , nextseqno p ,p) triple, since nextseqno p is incremented after being used to create 
the label. Q 

The following invariant says that when a process p has registered a view v, then any summary 
that p will create for a later view w will have its highprimary component equal to v or to a later 
view. 

Invariant 5.2 (to-impl) 

In any reachable state, let x be a summary, p € P, andv,w £ created such that v. id € registered p , 

w.id > v.id, and x € allstate[p,w.id]. Then then x.high > v. id. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, there are no v and w that satisfy the 
hypothesis. Hence the invariant is vacuously true. 

For the inductive step, assume that the invariant is true in a reachable state s. We need to 
prove that it is true in s' for any possible step (s,n,s'). The steps that can make the assertion 
false are those that create new summaries or new views. 
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Let us first consider actions that create new summaries. When a new summary is created 
without modifying the high component of existing summaries in allstate[p, w.id], then the assertion 
cannot be made false. So we only need to consider action n =dvs-register p when current p = w.id, 
which creates the first summary x that satisfy the hypothesis in s' (no such x existed in s). In 
this case we have that x' .high = w.id and by the inductive hypothesis w.id > v. id, hence we have 
that x 1 .high > v. id. 

Consider now actions that create new views, that is n =dvs-newview(w) p . We distinguish two 
cases: (1) the only registered view is vo, (2) there are registered views other than vo- In case (1) 
we have that y.high = go for any existing summary y; hence the invariant is true. In case (2) the 
invariant is true by inductive hypothesis. Q 

Next we provide some other auxiliary invariants. 

Invariant 5.3 (to-impl) 

In any reachable state, if x G allstate[p,g] then there exists v G created and q G v. set such 
that: (1) x.high = v. id, (2) x.high G registered q , (3) x.ord = buildorder[q, x.high] and (4) either 
x.high = g or current. id q > v. id 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, for any such x we have that x.high = go 
and x.ord = A and thus we can take v = vo and q = p and the invariant is true. 

For the inductive step, assume that the invariant is true in a reachable state s. We need to 
prove that it is true in s' for any possible step (s,7r, s'). If x G s'.allstate[p,g], then in most 
cases, there is y G s.allstate[p, g] with y.high = x.high and y.ord = x.ord, to which we apply the 
induction hypothesis. The only case where this does not happen is when n =dvs-newview(v) p , 
where v. id = g, and x is the summary whose components are taken from the state of p. In this 
case, there is y G s.allstate[p, s. current p ] with y.high = x.high and y.ord = x.ord, to which we 
apply the inductive hypothesis. Q 

Invariant 5.4 (to-impl) 

In any reachable state, if x G allstate then there exists w G created such that x.high = w.id, and 

for all p G w.set, p G attempted[w.id]. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, the invariant follows from the definition 
of allstate (set w = current. id p ). 

For the inductive step, assume that the invariant is true in a reachable state s. We need to prove 
that it is true in s' for any possible step (s, n, s'). The only step that we have to worry about is 
when a new summary is created. When a new summary x is created, x.high is set to the identifier 
of the current view, and a message has been received from everyone in the membership. □ 

Invariant 5.5 (to-impl) 

In any reachable state, if v G created, x G allstate and x.high > v. id then there exists p G v. set 

with current.idp > v. id. 

Proof: Fix v, x as given. Invariant 5.4 shows the existence of w G created such that x.high = w.id, 
and for all p G w.set, p G attempted[w.id]. Then Invariant 3.4 implies that there exists p G v. set 
with current-viewid[p] > v. id. But current-viewid[p] = current.idp, which yields the result. □ 
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Next we provide the crucial invariant corresponding to Lemma 6.18 of [17]. This invariant 
has a more subtle proof than the one given in Lemma 6.18 of [17]. That proof does not work 
in the setting of dvs because dvs guarantees a weaker intersection property (each primary view 
intersects only the primary views in between the preceding and the following totally registered 
primary views). The new proof also uses the fact about dvs that once a view is totally attempted, 
no views with lower identifiers can be registered. 

Invariant 5.6 (to-impl) 

In any reachable state, suppose that v G created, a G seqof(C), and for every p G v. set, the 
following is true: If current.id p > v. id then v. id G registered p and a < buildorder[p,v.id]. 
Then for every x G allstate with x.high > v. id, we have that a < x.ord. 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, the only created view is vo, and there is 
noiG allstate with x.high > go- So the statement is vacuously true. 

For the inductive step, assume that the invariant is true in a reachable state s. We need to 
prove that it is true in s' for any possible step (s, n, s'). So fix v G s' .created and a, and assume that 
for every p G v. set, if s'. current.idp > v. id then v. id G s'. registered p and a < s'.buildorder\p,v.id]. 

If v ^ s. created, then n must be createview(w). Then v.id g' s'. registered p for all p. Fix 
x G s'. allstate and suppose that x.high > v.id. Then Invariant 5.5 applied to s' implies that there 
exists p G v.set with s'. current.idp > v.id; fix such a p. Then the hypothesis part of the invariant 
for s' implies that v.id G s'. registered p , a contradiction. It follows that v G s. created. 

As usual, the interesting steps are those that convert the hypothesis from false to true, and 
those that keep the hypothesis true while converting the conclusion from true to false. 

In this case, there are no steps that convert the hypothesis from false to true: If there is 
some p G v.set for which s. current.idp > v.id and either v.id g' s.registered p or a is not a 
prefix of s.buildorder[p,v.id], then also s'. current.idp > v.id (the id never decreases) and either 
v.id g' s'. registeredp or a is not a prefix of s' .buildorder[p, v.id]. (These two cases carry over, since 
s. current.idp > v.id implies that v.id cannot be inserted into registeredp and buildorder[p,v.id] 
cannot change.) 

So it remains to consider any steps that keep the hypothesis true while converting the conclu- 
sion from true to false. Thus, we assume that if s. current.idp > v.id then v.id G s. registeredp and 
a < s.buildorder[p,v.id]. Suppose that x G s'. allstate and x.high > v.id. If also x G s. allstate 
then we can apply the inductive hypothesis, which implies that a < x.ord, as needed. So the only 
concern is with steps that produce a new summary. 

Any step that produces the new summary x by modifying an old summary x' G s. allstate, 
in such a way that x'.ord < x.ord and x' .high = x.high, is easy to handle: For such a step, 
x 1 .high > v.id and so the inductive hypothesis implies that a < x' .ord < x.ord, as needed. So the 
only concern is with dvs-register p steps that produce a new summary x from the state-exchange 
messages of a view w sent to some processor p. Thus x.high = w.id. Let x' be the summary of 
q' = chosenrep in s'.gotstate. We claim x' .high > v.id. 

To prove the claim, we let v' denote the unique element with highest view identifier among the 
elements of s' '.created such that v' .id < w.id and v' is totally registered in s'. Let v" denote either 
v' or v, whichever has the higher view identifier. Invariant 3.3 shows that w.set r\v".set / {}, no 
matter whether v" is v or v ' . Fix any element q" in w.set f) v" .set. 
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Recall that the precondition status = established of dvs-newview implies that domain(s' .gotstate % 
w.set, so by the code q" G domain(s.gotstate p ). Let x" be the summary s.gotstate(q") p ; we have 
x" G s.allstate[q",w.id\. 

We now show that v" .id G s. registered ql/ . We consider two cases: 

1. «" = «'. 

Then q" G u'.sef so by definition of v', we have that v' .id G s. registered q „. 

2. f" = f. Because s.allstate[q",w.id] is non-empty, we have that s. current, id q n > w.id. We have 
that x.high > v. id by assumption, and x.high = w.id by the code; therefore, w.id > v. id. So 
also s. current. id q " > v. id. Recall that we are in the case where the hypothesis of this invariant 
is true. Therefore, by this hypothesis (uses q" G v. set), we obtain that v. id G s. registered q „ 

By Invariant 5.2 (applied with q" replacing p) we obtain x" .high > v" .id. By the definition of 
q' as a member that maximizes the high component in the summary recorded in s'.gotstate, we 
have x 1 .high > x" .high. Therefore x' '.high > v" .id > v. id, completing our proof of the claim. 

If x' .high > v.id then we can apply the inductive hypothesis to x' and we are done, since 
x'.ord < x.ord. So suppose x' '.high = v.id. Note that x' G s.allstate[q', w.id]. By Invariant 5.3 
there must exist 2 q G v. set so that v.id G s. registered q , x'.ord = s.buildorder[q,v.id], and (either 
x 1 '.high = w.id or s. current. id q > v.id). Since x' .high = v.id < x.high = w.id, the last property 
can be simplified to s. current. id q > v.id. By monotonicity of current, we have s 1 .cur rent q > 
v.id. The hypothesis of this invariant says that this forces a < s' .buildorder[q,v.id]. Since 
x'.ord < x.ord by the code for this event, and x'.ord = s.buildorder[q,v.id] as shown above, and 
s.buildorder[q,v.id] = s' .buildorder[q,v.id] since q is not currently in view v, we get a < x.ord, 
which is what we need. D 

Next we provide some additional auxiliary invariants. 

Invariant 5.7 (to-impl) 

In any reachable state, if we have that v G created, a € seqof(C), and for every p € v. set, 
v.id 6 registered^ and a < buildorder[p,v.id], then for every x € allstate with x.high > v.id, 
a < x.ord. 

Proof: The are two possible cases: (1) x.high > v.id, (2) x.high = v.id. In case (1) we can 
apply Lemma 5.6. Consider case (2). Then we apply Lemma 5.3 to x, which gives v' G created 
and q' G v' .set such that x.high = v' .id, x.high G registered q i , and x.ord = buildorder[q' , x.high]. 
Since v.id = v' .id, Lemma 4.1 shows v = v' . Substituting in the facts above we see x.ord = 
buildorder[q',v.id]. Since q' G v. set, we can apply the premise of the corollary to see that a < 
buildorder[q',v.id]; that is, a < x.ord, as required. D 

The next invariant makes precise the fact that a label is in safe-labels p only after it (and all 
prior labels in order p ) were placed in order q at every member q of current. set p 

Invariant 5.8 (to-impl) 

In any reachable state, if I G safe-labels p and a is a prefix of order p that is terminated by I, then 

for all q G current. set p , a < buildorder[q, current. id p ] 



2 Direct application of the invariant actually shows the existence of some v and q € v. set, but since x .high = v. 
and also x .high = v.id, uniqueness of view identifiers shows we may take v to be v itself. 
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The next lemma shows that in any summary, the ord component is closed under the relation 
"sent-before-by-one-client" . 

Invariant 5.9 (to-impl) 

In any reachable state, the following is true. Assume 1,1' € £ andi' G N >0 . ///, I' G domain (allcontent) 
and 1. origin = I'. origin and I < I' and x G allstate and I' = x.ord(i') then there exists i such that 
i < i' Al = x.ord(i) 

The proofs of Lemmas 5.8 and 5.9 are left as exercises. Next we show that x. confirm is a 
prefix of a known sequence. This shows the consistency of the confirmed sequence of labels at 
different places in the system. 

Invariant 5.10 (to-impl) 

In any reachable state, if x G allstate then 

1. There exists v G created such that v. id < x.high and for every q G v. set, v. id G registered q 
and x. confirm, < buildorder[q,v]. 

2. x.next < length(x.ord) + 1 

Proof: By induction on the length of the execution. The base case consists of proving that the 
invariant is true in the initial state. In the initial state, the only created view is vq and the only 
extant summary is ({}, A, 1, go); it is eas Y to verify that the invariant is true. 

For the inductive step, assume that the invariant is true in a reachable state s. We need to 
prove that it is true in s' for any possible step (s, n, s'). 

For most of the steps, there is y in s. allstate so that y.next = x.next, y.ord = x.ord (and 
hence y. confirm = x. confirm) and also y.high = x.high. In these cases, the inductive hypothesis 
gives us what we want, since buildorder[q,v] increases monotonically through an execution. 

The steps that are left to consider are confirm p , Dvs-GPRCv((/,a))q, p and dvs-register p . 

Consider the case 7t=confirm p . If x is not the summary from the state of p in s', the invariant 
follows from the inductive hypothesis. If x is the summary from the state of p in s', the precon- 
dition of 7r shows that the newly confirmed message has label in s.safe-label p . By Invariant 5.8, 
taking v to be s.current p = x.high, we have part 1 of the invariant. The precondition of n also 
gives (x.next — 1) G domain (x.ord), thus showing part 2 of the invariant. 

Now consider n =Dvs-GPRCv((/,a))q, p . As before, if x is not the summary from the state of p 
in s', the invariant follows from the inductive hypothesis. If x is the summary from the state 
of p, let y denote the summary taken from p in state s. The code shows that x.high = y.high, 
x.next = y.next, and x.ord is an extension of y.ord. By part 2 applied to y, we see that y.next < 
length(y.ord) + 1 and therefore x.next < length(x.ord) + 1. Which proves part 2 for x; also it 
shows that x. confirm, = y. confirm, so that the inductive hypothesis of part 1 applied to y proves 
part 1 for x. 

Finally consider n =dvs-register p . As in the other two cases, if a; is not the summary from the 
state of p in s', the invariant follows from the inductive hypothesis. If x is the summary from the 
state of p, let y denote the summary, among those in gotstate p , with the highest value for y.next. 
The code shows that x.next = w.next. Summary y is in s. allstate. The inductive hypothesis 
shows that y. confirm has length y.next — 1, and that there is v G s. created such that v. id < y.high 
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and Vg E v. set it holds v.id E s. established q and y. confirm, < buildorder[q,v]). Now let z denote 
the summary of chosenrep(gotstate) , as calculated in the effect of it. Since z.high > y.high > v. id 
(recall the definition of z as being from a representative, that is, having maximal highprimary 
among summaries in gotstate), Invariant 5.7 shows that y. confirm < z.ord. Since z.ord < x.ord 
by the code, we deduce that y. confirm is a prefix of x.ord; as length (y. confirm) = y.next — 1 = 
x.next — 1, we have x. confirm = y. confirm. Also by the code we have x.high > y.high. Thus the 
inductive hypothesis applied to y, along with the monotonicity of the set created and the fact 
that v.id E registered q , gives the invariant for x. Q 

Invariant 5.11 (to-impl) 

In any reachable state, if x\,X2 E allstate and x\.high < x^-high, then x\. confirm < xi-ord. 

Proof: By Invariant 5.10, part 1, with x = x\, we have that there exists v such that v.id < x\.high 
and x\. confirm < buildorder[q,v]. By Invariant 5.7 used with a = x\. confirm since x^-high > v.id 
we have that the conclusion of Invariant 5.7 holds for xi- Hence xi. confirm < X2-ord. Q 

Invariant 5.12 (to-impl) 

In any reachable state, for any x,x' E allstate, either x. confirm < x' . confirm or x' .confirm < 

x. confirm. 

Proof: Without loss of generality, we can assume that x.high < x' .high. Prom Invariant 5.11, we 
have that both x. confirm and x'. confirm are prefixes of x'. order. D 

To prove that to-impl implements TO, we define a function from the reachable states of to- 
impl to the states of TO and prove that it is an abstraction function. This function, called Tto-, 
is defined exactly as in [17] and it is given in Figure 9. 



Let s be a state of to-impl. The state t = Tto(s) of TO is the following. 

1. t. queue = apply all ((s.allcontent, origin), s.allconfirm), 

where the selector origin is regarded as a function from labels to processors. 

2. t.next [p] = s. next-report p . 

3. t.pending[p] = apply all (s.allcontent, b) ■ s. delay p where b is the sequence of labels such that 

(a) range(b) is the set of labels I such that I. origin = p, {I, a) E s.allcontent for some a, 
and 

I $ range(s.allconfirm) . 

(b) b is ordered according to the label order. 



Figure 9: The abstraction function Tto- 

In order to prove that Tto is an abstraction function we need to prove that (a) for any initial 
state s of to-impl we have that Tto(s) is an initial state of TO, and that (b) for any possible 
step -k of to-impl there exists an execution fragment a of TO such that the trace of a is equal to 
the trace of 7r, that is, a and n have identical externally observable behaviors. Lemmas 5.13 and 
5.14 prove this. 
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Lemma 5.13 If s is an initial state o/to-impl then ^(s) is an initial state of TO. 

Proof: Let s be the initial state of to-impl. Let t = Tto{s)- By the definition of Tto we have 
that t. queue = A, t.next [p] = s. next- report p = 1 for any p and that t. pending = A. Hence t is an 
initial state of TO. □ 

Lemma 5.14 Lets be a reachable state o/to-impl, Tto{s) a reachable state of TO, and (s,7r,s') 
a step o/to-impl. Then there is an execution fragment a of TO that goes from Tto{s) to Tto{s'), 
such that trace(a) = trace(n). 

Proof: By case analysis based on the type of the action it. Define t = J-'to(s) and t' = Tto{s'). 

1. 7T = BCAST(o) p 

Since n is an input to TO, n is enabled in t. The effect of n shows that s' .allconfirm = 
s. allconfirm, s' . allcontent = s. allcontent, and s'.pending[p] = s.pending[p] + a. This implies 
that t' .pending[p] = t.pending[p] + a, thus showing that (t, n, t') is a step of TO. Hence we set 
a = 7T. Clearly trace(a) = trace(n). 

2. 7T = LABEL(a) p 

We to show that t = t' . The effect of n shows that t' .allconfirm = x. allconfirm, and 
t'. allcontent is the union of t. allcontent with (I', a) where I' = {t.current p ,x.nextseqno p ,p); 
by Invariant 5.1, this new label I' is greater than all labels in the domain of x. allcontent. Thus 
let us consider the sequence of labels a' (arranged in label order) such that range(a') is the 
set of labels I such that I. origin = p, (I, a'} G x' . allcontent for some a', and 
/ ^ range(x' .allconfirm) . We see that a' is related to the sequence a (defined the same 
way but using s instead of s') by a' = a + V . Therefore apply toall(s' .allcontent, a') = 
applytoall(s. allcontent, a) + a. On the other hand, the precondition of n shows that a is 
the head of s. delay p , and so the effect of n means s. delay p = a + x 1 .delay p . Thus, t' .pending [p] 
is the same as t.pending[p], because in the concatenation that defines this component, the 
element a is simply transferred from suffix to prefix. Therefore t' = t. Hence we set a = t. 

3. 7T = CONFIRMp 

Clearly the effect of n shows s. allcontent = s' . allcontent . 

If s.nextconfirm p < length (s. allconfirm) then Invariant 5.12 and the effect of n shows that 
s' . allconfirm = s. allconfirm, so that t = t' . In this case we set a = t. 

Otherwise s .nextconfirm p = length (s. allconfirm) +1, so the effect of n shows that s' .allconfirm = 
x. allconfirm ■ (1) where I = s. order p (s.nextorder p ). Let q = I. origin and a = s. allcontent (I). 
We claim that (i,To-oRDER(a,g) ) i / ) is a step of TO. 

We first show that To-oRDER(a,g) is enabled in t. We have / G domain (s. allcontent) and 
I $ setof (s. allconfirm); this means that a is an element of the sequence t.pending[q]. Also by 
Invariant 5.9, any lower label with origin q is in s.confirm p and so in s. allconfirm. Since the 
sequence a used to define t.pending[q] is arranged by label, we see that I is the head of a, 
and so a is the head of t.pending[q], as required. Further, the equation above for t'. allconfirm 
shows that t'. queue = t. queue + (a,p), and this is what is needed to show that n takes t to t' . 

Hence we set a =To-oRDER(a,g). Clearly trace(a) = trace(ir) = A. 
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4. 7T = DVS-GPRCV(s)q, p 

In some cases this may change the value of nextconfirm q , but in every situation it leaves 
allconfirm unchanged (it only moves nextconfirm q to a value already somewhere in allstate). 
Thus t' = t. Hence we set a = t. 

5. 7T = BRCV(a) p ,q 

We need to show that n is enabled in t as an action of TO. This is immediate from the fact that 
■k is enabled in s as an action of to-impl. Similarly, the effect corresponds (only nextreport q 
is altered). 

Hence we set a =BR,cv(a)q, p . Clearly trace(a) = trace(ir) = A. 

6. Remaining actions. 

The other actions leave t' = t. Hence we set a = t. 

U 

Lemmas 5.13 and 5.14 prove that Tto is an abstraction function from to-impl to TO and 
thus the following theorem holds (this is a standard inference, cf. [29]). 

Theorem 5.15 Every trace of to-impl is a trace of TO. 



6 Discussion 

We presented a specification for a dynamic primary view group communication service and an 
algorithm that formally implements the service, and we showed the utility of our new specification 
by using it to implement a totally ordered broadcast. This work deals entirely with safety proper- 
ties; future work could consider performance and fault-tolerance properties using the conditional 
performance analysis as presented in [17]. It also remains to study other applications of our dvs 
specification, such as replicated data applications and load-balancing applications. 

Another interesting exploration direction considers variations on the DVS specification, for 
example, one in which the state exchange at the beginning of a new view is supported by the 
dynamic view service. We are currently studying variations on our specifications that are more 
specifically tuned to systems like Isis and Ensemble. In particular, we would like to understand 
the power of the Isis requirement that processes that move together from one view to the next 
receive exactly the same messages in the first view, especially for coherent-data applications. 

In a related work [11] we also have investigated a generalization of the dvs service to dynamic 
sets of primaries rather than individual primaries, in order to tolerate transient failures during a 
particular view. 

Acknowledgments: We thank Ken Birman, who urged us to consider the interesting issues of 
dynamic views. We also thank Idit Keidar and Robbert van Renesse for discussions about our 
dvs specification and our algorithm models and proofs. 
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